cernopendata / cernopendata-client

CERN Open Data command-line client
http://cernopendata-client.readthedocs.io/
GNU General Public License v3.0
10 stars 9 forks source link

file locations: distinguish between EOSPUBLIC and OPENDATA URIs #115

Open tiborsimko opened 3 years ago

tiborsimko commented 3 years ago

Current behaviour

The client currently exposes EOSPUBLIC locations of files, for example:

$ cernopendata-client get-file-locations --recid 5000          
http://opendata.cern.ch/eos/opendata/cms/software/2011-doubleelectron-doublemu-mueg-ttbar/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz

This file also exist attached to the record as /record/NNN/files/FILE.EXTENSION, which would give:

http://opendata.cern.ch/record/5000/files/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz

What is the difference? In the first case, the file is served from OPENDATA via reverse HTTP proxy to EOSPUBLIC (and is not cached). In the second case, the file is served from OPENDATA via XRootD proxy to EOSPUBLIC (and is cached if it is sufficiently small).

Due to several issues with EOSPUBLIC reverse proxy, in PR #113 we have introduced file index lookups from the latter URIs, while still exposing the former URIs.

Expected behaviour

It would be good to consistently expose both kind of URIs and allow user to specify a command-line switch to use one or the other.

Example: we can introduce a new command-line option --uri-style having two values, "eos" and "record":

$ cernopendata-client get-file-locations --recid 5000 --uri-style=eos
http://opendata.cern.ch/eos/opendata/cms/software/2011-doubleelectron-doublemu-mueg-ttbar/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz
$ cernopendata-client get-file-locations --recid 5000 --uri-style record
http://opendata.cern.ch/record/5000/files/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz

The default value could be "eos" to keep the old behaviour, but we could switch to "record" if this one is more stable.

Things to beware about: