-
```
What steps will reproduce the problem?
1.
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
2.6.1
Please provide …
-
There are many existing data catalogs out there. We currently require users to create a FilePattern from either a list of URLs or a formatting function and a set of keys. However, if the data are alre…
-
https://github.com/branislavblazek/projects/blob/450eded85f0666e54acaa13aeb2477234bb55849/web_crawling/requests/edupage.py#L10
This really shouldn't be here :)
-
If a view in Drupal has multiple pages it seems that only the first page is crawled and the rest are ignored. Is this by design? Is there a setting to enable crawling these pages? A large portion of m…
-
Hello,
this is more a suggestion than an issue.
The duc indexing is already quite fast but you might be interested in the filesystem crawling algorithm of [robinhood](https://github.com/cea-hpc/…
-
https://cjting.me/2020/07/01/douyu-crawler-and-font-anti-crawling/
斗鱼关注人数爬取 ── 字体反爬的攻与防
-
Hi there, I was just introduced to your site and I love it sooo much
I would like to request that these two URLs be added for indexing:
http://keystonetrilobite.com/
https://www.rsssf.org/
T…
-
Hello .
Why if the entry point is zaledia.com, the crawler does not find all the links and gets stuck on zenalio.ch? Maybe it depends on the number of threads? That's how the crawler was launched:
…
-
I just had this idea. What da' ya' say?
# crawling
-
google structured data crawling (jsonld) complains about the length of the abstract field if it is too short (5000). suggestion would be for the json-ld representation to increase the size of the abst…