cernopendata / cernopendata-client

CERN Open Data command-line client
http://cernopendata-client.readthedocs.io/
GNU General Public License v3.0
10 stars 9 forks source link

download-files: initial release #22

Closed tiborsimko closed 4 years ago

tiborsimko commented 4 years ago

If a user wishes to download files belonging to a record, the current technique is to list file locations:

$ cernopendata-client get-file-locations --recid 5500 --protocol http
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/HiggsDemoAnalyzer.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/List_indexfile.txt
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall_lvl3.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3MC.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3data.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4MC.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4data.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.pdf
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.png

and then launch wget or curl commands to download them.

The goal of this issue is to simplify this process by introducing new command download-files that would do this for the user.

Possible options:

$ cernopendata-client download-files --recid 5500 --protocol http --parallel-processes 2

This would launch two parallel downloading processes, using a suitable Python library, to download the files into current directory.

P.S.: MVP is simply to download files; resuming interrupted downloads will be part of another issue, but it is good to think about this functionality upfront.

P.S. An option --target-directory could be introduced which would recreate directory structure known from the original record. This will be important for AOD files which have subdirectory structure such as this one. So the corresponding subdirectories would have to be created in the target directory.

tiborsimko commented 4 years ago

Example record to support in this issue: 5500.

The "harder" use case of recrord 1 with index files was separated into issue #25.

ParthS007 commented 4 years ago

Documenting here the options to test the download files functionality

I will be going ahead with these options and check the time taken in downloading and if they are compatible with Python 2 and 3 both.

@tiborsimko Can you please provide a recid with large files. Total maybe around (5 gigs). I will test with 5500 first.

tiborsimko commented 4 years ago

@tiborsimko Can you please provide a recid with large files. Total maybe around (5 gigs). I will test with 5500 first.