Open katilp opened 4 years ago
So far we have only been creating index files for "big" {AOD,AODSIM,etc} datasets. Do we need index files for small ones as well? One could use e.g. xrootd commands like:
$ xrdfs root://eospublic.cern.ch ls -l /eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/
or the nascent cernopendata-client:
$ cernopendata-client get-file-locations --recid 211 --protocol http
to list all files belonging to a dataset.
The cernopendata-client
is not released on PyPI yet, but we can look at accelerating this if it may help...
@tiborsimko It would be very good to have cernopendata-client and instructions available, so that we can instruct its use at the OD workshop.
@katilp Yes, we are working on the first public release with @ParthS007. Here's the kanban board: https://github.com/cernopendata/cernopendata-client/projects/1
The nascent user guide is available here: https://cernopendata-client.readthedocs.io/en/latest/userguide.html
@tiborsimko It would be nice to mention this in the forum, could you add a post with a link there?
Yes, I thought of updating the docs first, we can then make it more popular in the Forum.
Hi, related: I went to one of the datasets, clicked on the download button, got a pop-up saying that I should consider using xrootd instead of downloading the dataset via HTTP, but the xrootd path of the file can't be found on the page.
It might be nice to list it close to the download button.
@tiborsimko Coming back to this: I think we should indicate the file location in the records themselves. Many users ask about it.
We could introduce a new button named e.g. "List file locations" (or some such) right next to the "Download" button, make it visible all the time, so that users could easily list either HTTP or XRootD file locations that way. Would that do?
For the moment, the following files produced with AOD2NanoAODOutreachTool
:
-bash-4.2$ eos ls /eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/
DYJetsToLL.root
ForHiggsTo4Leptons
GluGluToHToTauTau.root
Run2012BC_DoubleMuParked_Muons.root
Run2012B_TauPlusX.root
Run2012C_TauPlusX.root
TTbar.root
VBF_HToTauTau.root
W1JetsToLNu.root
W2JetsToLNu.root
W3JetsToLNu.root
ws1.0
-bash-4.2$ eos ls /eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/
Run2012B_DoubleElectron.root
Run2012B_DoubleMuParked.root
Run2012C_DoubleElectron.root
Run2012C_DoubleMuParked.root
SMHiggsToZZTo4L.root
ZZTo2e2mu.root
ZZTo4e.root
ZZTo4mu.root
http://opendata.cern.ch/search?page=1&size=20&type=Dataset&subtype=Derived&file_type=nanoaod
The file locations:
root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/DYJetsToLL.root
[...]
root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/SMHiggsToZZTo4L.root
[...]
NB the root path is in the json field files
but not displayed:
files: [
{
checksum: "adler32:daa50fff",
filename: "Run2012B_DoubleMuParked.root",
size: 3179316378,
uri_http: "http://opendata.cern.ch/record/12365/files/Run2012B_DoubleMuParked.root",
uri_root: "root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/Run2012B_DoubleMuParked.root"
}
],
@tiborsimko In cases when it is just one file (as it is for all of these) can we just display the two uri
s by default e.g. here:
The derived data records have only direct download option and the address for xrootd operations is not visible to the users. This should be added to all records with derived data in root format (@mattbellis fyi)