Open yjernite opened 2 years ago
We want to be able to obtain all web and media content associated with a specific list pre-identified domain names.
This issue tracks domain names identified in the BigScience Data Cataloging Event
The steps to follow are:
In particular, the list of domain names mentioned in outgoing link may be used to obtain a "depth 1 pseudo-crawl" by running the same process again
cc @sebastian-nagel
We want to be able to obtain all web and media content associated with a specific list pre-identified domain names.
This issue tracks domain names identified in the BigScience Data Cataloging Event
The steps to follow are:
In particular, the list of domain names mentioned in outgoing link may be used to obtain a "depth 1 pseudo-crawl" by running the same process again