bbcarchdev / anansi

A Linked Open Data Web crawler
https://bbcarchdev.github.io/anansi/
Apache License 2.0
0 stars 0 forks source link

Add a parameter to control the exploration of the crawler #60

Closed cgueret closed 7 years ago

cgueret commented 8 years ago

For instance a parameter "radius=1" to restrict exploration to the direct object properties of the resources explicitly added to the queue.

Internal tracking: RESDATA-962

cgueret commented 8 years ago

We could add one boolean parameter "same-origin" and one integer parameter "radius":

On a side note, Anansi currently adds to the queue the objects of rdf:type and then crawls vocabularies. We could try to avoid crawling vocabularies and focus on resources via an additional parameters though this will have to be extended to deal not only with avoiding "rdf:type".

nevali commented 7 years ago

Given the crawler’s stateless nature, it’s difficult to see how this could reasonably be implemented.