dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Issue uploading configuration to ReqMgr2 Config Cache: Document is too large #11719

Open ggonzr opened 1 year ago

ggonzr commented 1 year ago

Impact of the bug

System affected:

Describe the bug

When McM requires to upload the configuration to ReqMgr2 Config Cache for a request with a large number of files listed in the ‘PSet’ attribute, the request body tends to be big, the HTTP request size is more than 8 MB. This raises issues for uploading the request because the maximum HTTP request size allowed by this DB is 8 MB [1] (as confirmed via email). As a result, this operation returns an HTTP 400 response with the message: {“error”: “document_too_large”, “reason”: “”}

How to reproduce it

Send an HTTP POST request to the endpoint https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/_bulk_docs including into the body the JSON content available in the file BPH-GenericGSmearS-00001.json (I am attaching it into this issue as an example into a zip file: BPH-GenericGSmearS-00001.zip). This request has around 74K file names registered under the list: docs → (First element) → "pset_tweak_details" → "process" → "source" → "fileNames"

Expected behavior

The HTTP request to upload the configuration to ReqMgr2 Config Cache should be accepted and finished successfully.

Additional context and error message

With the feedback received by email, I think we can have the following solutions available:

  1. Increase the maximum allowed size for an HTTP request by updating the configuration attribute max_document_size [1].

  2. Reduce the HTTP request size to be lower than the limit. As shared in the email discussion, the list of filenames listed in the PSet attribute could be dropped. If so, please describe the conditions that allow us (from PdmV side) to discard this attribute.

Thanks, Best regards, Geovanny

References

[1] CouchDB Configuration – max_document_size – Available at: https://docs.couchdb.org/en/stable/config/couchdb.html#couchdb/max_document_size

amaltaro commented 1 year ago

@ggonzr Hi Geovanny, apologies for the delay on getting back to this.

From what I can see, the worst offender in these job configuration files (PSet) is the list of input files, by far. To the best of my knowledge, we have basically 2 potential lists of input files. a) primary input files (which can be classified in empty, primary EDM data, and lhe files) b) secondary input files (apparently classified as premix or classic)

Other than lhe files (using LHESource), all of the other input files are overwritten when a job is getting bootstrapped in the worker node.

However, to be on the safe side, I would suggest to stop providing the list of secondary files in the final PSet that gets uploaded to central CouchDB / ReqMgr2, while keeping the primary files untouched, at least for the moment.

On what concerns the list of secondary files, I can see that it can be provided through the following attributes in the PSet:

  1. process.mix.input.fileNames: for classical pileup
  2. process.mixData.input.fileNames: for premix pileup I am unsure though whether there is any other module that could be used for that. We would have to cross check this with framework experts or someone in PdmV.

Just in case, I also provide configuration examples:

ggonzr commented 1 year ago

Hi Alan (@amaltaro),

Thanks for the feedback. Checking with PdmV conveners, all the files listed in that field are primary input files so, based on the approach shared, we can not discard them. Also, there are no secondary files listed for this McM request (the attribute available in JSON file at the same level, secondaryFileNames, is empty).

Is there any other approach we can follow from the PdmV side to upload this information in the ReqMgr2 config cache or would it be possible to increase the maximum allowed size to accept this information as it is?

Best regards, Geovanny

amaltaro commented 1 year ago

@ggonzr Hi Geovanny, I see you have the following data structure in your zipped file (json of the PSet):

{"docs": [
  {"pset_tweak_details": 
    {"process": 
      {"options": 
        {"source": {"parameters_": ["fileNames", "secondaryFileNames", "inputCommands", "dropDescendantsOfDroppedBranches"]
          "fileNames": [HUGE list of LFNs]

both fileNames and secondaryFileNames are NOT used as provided in the original PSet configuration, instead WMAgent updates them during the job runtime. As mentioned above, the only exception is for the LHESource, where a list of non-EDM files are provided (their file extension is .lhe).

Do you think we could identify those specific cases in McM and let them through, while primary and secondary files get removed from the configuration uploaded to CouchDB/ReqMgr2? Those requests should be called wmLHE (or pLHE, I always confuse them!).

Sorry for pushing into the non-easy direction, but this is definitely the most sustainable solution.

amaltaro commented 11 months ago

@ggonzr hi Geovanny, I just wanted to follow up on this issue and hear whether you have made any modifications on your side and/or if you need further information from the WM side?

ggonzr commented 11 months ago

Hi Alan (@amaltaro), I performed some tests to retrieve the source type from the cmssw embedded code to check whether the files can be discarded or not following the advice given. Unfortunately, we paused this development due to we had/have other tasks with higher priority to solve. This issue is mainly related to a test request we want to process so there is no hurry on finishing this. There are no changes deployed in our production environments related to this and I will let you know if I require any assistance from your side or if there is any update related to this from the PdmV side.

Thanks, Best regards, Geovanny