datalad / datalad-crawler

DataLad extension for tracking web resources as datasets
http://datalad.org
Other
5 stars 16 forks source link

Describe basic structure of of crawler pipelines #77

Open mih opened 5 years ago

mih commented 5 years ago

http://docs.datalad.org/en/latest lacks any detail on how pipelines are constructed, what building blocks are available, etc. We need to describe the existing ones from a conceptual perspective to enable anyone to understand how they are meant to be used..

yarikoptic-gitmate commented 5 years ago

GitMate.io thinks possibly related issues are https://github.com/datalad/datalad/issues/2831 (Describe basic structure of extracted metadata under .datalad/metadata), https://github.com/datalad/datalad/issues/1879 (Maintainable crawler tests), https://github.com/datalad/datalad/issues/586 (crawler pipeline for 'indexes' (ftp/http) with specs for where to break into submodules), https://github.com/datalad/datalad/issues/1853 (basic search crashes ), and https://github.com/datalad/datalad/issues/1748 (Simplistic demo of datalad crawler).