-
### Implementation of chrome extension for this crawler or a new one?
I'm not really sure. It seems that I have to remove the server side electron and modules some new specific crawler. And implemen…
-
## Summary
It should be possible to update some of the jobs settings while they are running. This would be specially useful for the settings related to crawling speed.
## Motivation
I have e…
-
It would be useful to only fetch _some_ data from Github; for example, if I only need issues, pull_requests, and repos, to not have to fetch commits or issue_comments so as to reduce the amount I need…
-
When starting a crawl from the Homepage, some posts might not be crawled if they are not linked.
To fix this problem, we could evaluate the use of sitemap, or perhaps forcing the tool to crawl the …
-
On every deploy, the plugin overwrites the code in the online editor and I would like to keep it to customise the recordExtractor. I tried disabled = true in the netlify.toml file but it doesn't trigg…
-
### Description
The ScrapyPriorityQueue throws a **builtins.KeyError**.
```
2024-03-04 09:33:09,684 2112359:CRITICAL [twisted] Unhandled Error
Traceback (most recent call last):
File "/opt/…
-
## Demande de refactoring
### Job story
Actuellement le fonctionnement des bouquets est fortement lié à ecosphéres (principalement par la présence du mot dans l'intitulé des propriétés telles qu…
-
@lpinner
I need to add a driver to the cosmo skymed and Radarsat-2 images. What is the best/fastest way to do it?
Thanks.
-
```
What steps will reproduce the problem?
1. Set proxy settings in CrawlConfig
2. Add BasicAuthInfo to CrawlConfig
3. Try to crawl a site with basic authentication
What is the expected output? What …
-
I use this excellent library to scrape some site, but sometimes it stopped unexpected without any exceptions or errors.
my custom webcrawler is belows
``` java
public class YNUWebCrawler extends WebC…