-
This is useful to avoid burdening partner sites and their APIs.
**Background**
We have in the past been asked to remove functionality from the site as other sites couldn't handle the amount of re…
-
Some conferences seem to have not been updated for a long time because no one cares about them. Do you consider adding crawlers and regularization to automate this process?
-
Add crawl spiders for the following or popular websites.
- Youtube
- Quora
- Facebook
- Reddit
- GitHub
Currently implemented spiders can be found in - https://github.com/leopardslab/CrawlerX/…
-
Hello!
I have been doing some tests in a situation where multiple crawlers are set each a with a Listener for a Crawl event. When the HttpCrawlerConfigs are added to the HttpCollector it duplicates…
-
``` yaml
# Ticket imported from Trac
```
Right now, Crawlers as Google cannot index all the public contents because we do show just a few events in the CategoryDisplay.py. Indico should provide a sit…
-
The webcrawlers have been merged on to the EC2, however the Shadow Seals crawler does not require an EC2. Therefore, it should be split from OFA, and the EC2, then moved over to it's own lambda.
Pl…
-
Hi! I'm currently using Typebot in production on a custom domain, and I would like to enable Google's web crawler and Linkedin post scraping to work, however, the following tag in the header of the pa…
-
I am hosting a Single-Page App (SPA) on Functions. Proxies is set up to route all requests except for the API route to the static HTML content hosted on Azure Storage. This works great for browsers. I…
-
DataLakeCatalog/DataCatalogDatabase should have the option of manually setting the tables for the crawler as parameters. There are several use cases that require a manually created catalog table.
-…
-
Search Engine Optimization so the page appears when someone googles me.
Related to robots.txt, limit what crawlers can see