Open belforte opened 2 years ago
I do not see any easy solution.
HTCondorDataWorkflow
the full metadata is anyhow retrieved before passing it to the client, that may be good enough.No idea looks really appealing.
another possibility:
This of course requires change in the client side as well and could be the change to get back to having again ?subresource=report
instead of ?subresource=report2
in the URL
See https://mattermost.web.cern.ch/cms-o-and-c/pl/4phkf68sxbf9zyipzb3i9399xo for an initial discussion.
Actually, while browsing the WMCore code, i discovered that WMCore/REST already supports compression, but only zilib with level 9 [1], which is not the best 2.
How to use it: just add the header "Accept-Encoding: deflate"
to the request.
Example with a simple request [3] and with a long request [4], which shows that the current compression does not really help us. So, if we want to pursue the compression route, we may need to push some changes upstream. We will not need to implement it from scratch, we can add zstd in parallel to the current zlib, but it will take some time nonetheless.
This is not a suggestion, but just something that I found that may be worth keeping in mind.
[3]:
220523_101151%3Acmsbot_crab_Jenkins_CMSSW_12_4_X_2022-05-23-1100_el8_amd64_gcc10_94
: task with 1 job
[4]:
220506_133045%3Amabarros_crab_GS_Jpsi_20to40_Dstar_DPS_2016posVFP_13TeV_06-05-2022
: task with many jobs, 168MB filemetadata.
Maybe put a brute force cut at the root: if a file has more lumis than events, do not pass around lumi info. So FileMetaData never get too large.
Without reading this issue, let me bold enough and ask/suggest something: do lumi ranges help in this context? I know that there are cases where random runs/lumis could potentially be worse in the format of ranges than a flat list of them. But if it's mostly sequential, then it might save a lot.
hmm... thanks @amaltaro , that's a good point to enlarge our horizon. But I am not sure we can put lumi ranges in DBS, we need to list the number of events in each. We could compress somehow and then expand when there are N sequential lumis with same number of events, it is a new format.
I still like the idea of forbidding this early in the game, I am not convinced that lumi info in DBS is useful for MC. At most we could store number of lumis per file, to allow processing files in multiple jobs. But.. will anybody care to find lumi #45237 in a simulated dataset ?
anyhow, this is clearly not a source of operational problems at this point. Reducing priority
need to take care of FMD retrieval by CRAB Client during crab report https://github.com/dmwm/CRABServer/blob/1a823b40b15bda1a84b5a9d002c6278bc58c85eb/src/python/CRABInterface/HTCondorDataWorkflow.py#L151-L155