cernopendata / cernopendata-client

CERN Open Data command-line client
http://cernopendata-client.readthedocs.io/
GNU General Public License v3.0
10 stars 9 forks source link

cli: new command list-directory #82

Closed tiborsimko closed 3 years ago

tiborsimko commented 4 years ago

Introduce new command list-directory that would take an EOSPUBLIC path and would output files belonging to this directory and its subdirectories.

Example:

$ cernopendata-client list-directory /eos/opendata/cms/validated-runs/Commissioning10
root://eospublic.cern.ch//eos/opendata/cms/validated-runs/Commissioning10/Commissioning10-May19ReReco_7TeV.json
root://eospublic.cern.ch//eos/opendata/cms/validated-runs/Commissioning10/Commissioning10-May19ReReco_900GeV.json

Beware of several situations:

The implementation could use xrootdpyfs and a snippet like:

fs = XRootDPyFS("root://eospublic.cern.ch//eos/opendata/cms/Run2010B/BTau/AOD/Apr21ReReco-v1/0000/")
files = fs.listdir()
ParthS007 commented 4 years ago

the path could give many hits, e.g. /eos/opendata/cms would want to list millions of files, so we have to stop it.

What should be the upper threshold length to stop it?

tiborsimko commented 4 years ago

Dunno if fs.listdir() has some built-in stopping facility; if no, we'd probably have to launch it in a thread and kill it after N seconds where N could be 60 or something. So we could implement a timeout threshold to be fully safe.

You can test using the given path above and just try its parent directories... 0000 should be a breeze, BTau should also work well, while Run2010B would probably be already too much and cms should definitely be an overkill. So we could tweak the value of N so that BTau would work well (and then double it, roughly speaking).