dice-group / Squirrel

Squirrel searches and collects Linked Data
Other
22 stars 19 forks source link

Improvement of the HTTP Fetcher and Queue when status 500/503 is returned #81

Open gsjunior86 opened 5 years ago

gsjunior86 commented 5 years ago

When the Fetcher tries to access an URI and gets status 500/503 is returned, the URI is removed when the response (empty list of triples) is sent to the Frontier. The Fetcher should implement the Retry Pattern. A parameter should be passed to the frontier to determine how many retries should execute and the waiting time between retries. If still, a flag should be put in Crawleable URI data map, so the queue won't delete this URI and may crawl it later.