cernopendata / opendata.cern.ch

Source code for the CERN Open Data portal
http://opendata.cern.ch/
GNU General Public License v2.0
665 stars 148 forks source link

CMS: alll derived data in root format should have file listing (root://eospublic...) available #2846

Open katilp opened 4 years ago

katilp commented 4 years ago

The derived data records have only direct download option and the address for xrootd operations is not visible to the users. This should be added to all records with derived data in root format (@mattbellis fyi)

tiborsimko commented 4 years ago

So far we have only been creating index files for "big" {AOD,AODSIM,etc} datasets. Do we need index files for small ones as well? One could use e.g. xrootd commands like:

$ xrdfs root://eospublic.cern.ch ls -l /eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/

or the nascent cernopendata-client:

$ cernopendata-client get-file-locations --recid 211 --protocol http

to list all files belonging to a dataset.

The cernopendata-client is not released on PyPI yet, but we can look at accelerating this if it may help...

katilp commented 4 years ago

@tiborsimko It would be very good to have cernopendata-client and instructions available, so that we can instruct its use at the OD workshop.

tiborsimko commented 4 years ago

@katilp Yes, we are working on the first public release with @ParthS007. Here's the kanban board: https://github.com/cernopendata/cernopendata-client/projects/1

tiborsimko commented 4 years ago

The nascent user guide is available here: https://cernopendata-client.readthedocs.io/en/latest/userguide.html

katilp commented 4 years ago

@tiborsimko It would be nice to mention this in the forum, could you add a post with a link there?

tiborsimko commented 4 years ago

Yes, I thought of updating the docs first, we can then make it more popular in the Forum.

eguiraud commented 3 years ago

Hi, related: I went to one of the datasets, clicked on the download button, got a pop-up saying that I should consider using xrootd instead of downloading the dataset via HTTP, but the xrootd path of the file can't be found on the page.

It might be nice to list it close to the download button.

katilp commented 2 years ago

@tiborsimko Coming back to this: I think we should indicate the file location in the records themselves. Many users ask about it.

tiborsimko commented 2 years ago

We could introduce a new button named e.g. "List file locations" (or some such) right next to the "Download" button, make it visible all the time, so that users could easily list either HTTP or XRootD file locations that way. Would that do?

katilp commented 2 years ago

For the moment, the following files produced with AOD2NanoAODOutreachTool:

-bash-4.2$ eos ls /eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/
DYJetsToLL.root
ForHiggsTo4Leptons
GluGluToHToTauTau.root
Run2012BC_DoubleMuParked_Muons.root
Run2012B_TauPlusX.root
Run2012C_TauPlusX.root
TTbar.root
VBF_HToTauTau.root
W1JetsToLNu.root
W2JetsToLNu.root
W3JetsToLNu.root
ws1.0
-bash-4.2$ eos ls /eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/
Run2012B_DoubleElectron.root
Run2012B_DoubleMuParked.root
Run2012C_DoubleElectron.root
Run2012C_DoubleMuParked.root
SMHiggsToZZTo4L.root
ZZTo2e2mu.root
ZZTo4e.root
ZZTo4mu.root

http://opendata.cern.ch/search?page=1&size=20&type=Dataset&subtype=Derived&file_type=nanoaod

The file locations:

root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/DYJetsToLL.root
[...]
root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/SMHiggsToZZTo4L.root
[...]
katilp commented 2 years ago

NB the root path is in the json field files but not displayed:

files: [
  {
    checksum: "adler32:daa50fff",
    filename: "Run2012B_DoubleMuParked.root",
    size: 3179316378,
    uri_http: "http://opendata.cern.ch/record/12365/files/Run2012B_DoubleMuParked.root",
    uri_root: "root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/Run2012B_DoubleMuParked.root"
  }
],

@tiborsimko In cases when it is just one file (as it is for all of these) can we just display the two uris by default e.g. here:

image