Open yarikoptic opened 8 years ago
Here is a good one for you @glalteva . Create a new template/pipeline (e.g. call it "index_fetcher") which would allow to define topurl, and then how to split into subdatasets, and fetch all those in. Also all the versioning support (optional) as e.g. on this website if file names carry versioned suffixes
other (smaller) datasets to work with: http://index.okfn.org/dataset/ http://data.caida.org/datasets/as-relationships/ https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/
Looking at http://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/, we could easily specify a list of regexps to specify at which level to break into submodules (e.g. http://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/phase1/data/)