cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.32k forks source link

SiteLocalConfigService::storageDescriptionPath inconsistency #46043

Open Dr15Jones opened 1 month ago

Dr15Jones commented 1 month ago

All other code in SiteLocalConfigService find where the proper site information is stored via the member data m_url, which normally comes from the environment variable SITECONFIG_PATH. However, SiteLocalConfigService::storageDescriptionPath calls getenv("SITECONFIG_PATH") itself (without checking it is valid).

Should storageDescriptionPath be changed to use m_url?

Dr15Jones commented 1 month ago

assign core

cmsbuild commented 1 month ago

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 1 month ago

cms-bot internal usage

cmsbuild commented 1 month ago

A new Issue was created by @Dr15Jones.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 1 month ago

Should storageDescriptionPath be changed to use m_url?

For history, the SiteLocalConfigService::storageDescriptionPath() function was introduced in https://github.com/cms-sw/cmssw/pull/37278. I don't see any particular discussion there on this particular question. Although we pretty much require the SITECONFIG_PATH to be defined. And the job should fail earlier if the SITECONFIG_PATH is not defined (demonstrated e.g. in https://cms-talk.web.cern.ch/t/crab-test-cmssw-12-6-x-invalid-site-local-config/15423/15), unless the node happens to have a file /JobConfig/site-local-config.xml.

All present uses (I guess, didn't really verify) of m_url are about access to site-local-config.xml (whose default location is $SITECONFIG_PATH/JobConfig/site-local-config.xml. The SiteLocalConfigService::storageDescriptionPath(), however, operates on the directory structure under $SITECONFIG_PATH.

Theoretically we could take the directory part of m_url and go one level up. My only question in that approach would be how to deal with the case when the site-local-config.xml path is overridden in the configuration https://github.com/cms-sw/cmssw/blob/9e7bb371ab67f06319332b3978c9eebd728c2a27/FWCore/Services/src/SiteLocalConfigService.cc#L77 Should the storagePath() use the directory structure specified by the overridden location of site-local-config.xml, or $SITECONFIG_PATH?

In any case I think

Adding @stlammel @nhduongvn if they would have any thoughts.

stlammel commented 1 month ago

In case of a subsite override, m_url would be of the form $SITECONFIG_PATH//JobConfig/site-local-config.xml and one would have to go up two directory levels. Is the override the env variable SITECONFIG_PATH provides insuffcient? In the age of CVMFS, one could default SITECONFIG_PATH, i.e. if not set, to /cvmfs/cms.cern.ch/SITECONF/ in both places. My two cents.

makortel commented 1 month ago

In case of a subsite override, m_url would be of the form $SITECONFIG_PATH//JobConfig/site-local-config.xml and one would have to go up two directory levels.

Well, in case of CMSSW-configuration level override, I'd assume the m_url could be nearly anything, and wouldn't necessarily follow CMS' production infrastructure conventions. (other question is then how much such flexibility should we allow).

Is the override the env variable SITECONFIG_PATH provides insuffcient?

I can imagine weird (user) use cases where adjusting environment variables might not be easy, but adjusting CMSSW configuration would be.

nhduongvn commented 1 month ago

The m_url points to site-local-config.xml storageDescriptionPath is used to get the path to storage.json (which equivalent to storage.xml in the old days). Since it is not trivial to obtain this path (I mean using couple lines of code), this function is needed. In the old days, the location of storage.xml is clearly defined in site-local-config.xml (see https://cmssdt.cern.ch/lxr/source/FWCore/Services/src/SiteLocalConfigService.cc#0353 <catalog url="trivialcatalog_file:/x/y/z.xml"/>) so a dedicated function might not needed. If we want to totally liberate from rigid setting dictated by $SITECONFIG_PATH, we need to go back to the old format of site-local-config.xml where location of storage.json is explicitly defined similar to storage.xml case. However, from what I understand this conflicts with the ideas of improving the whole storage descriptions in the first place (I mean for some reasons we prefer to maintain a consistent location of storage.json which is where JobConfig folder locates) I think storageDescriptionPath should not be replaced by m_url based on the current settings. Moreover, they are doing two different jobs: m_url is for site-local-config.xml while storageDescriptionPath is for storeage.json

nhduongvn commented 1 month ago

if we want CMSSW-configuration level override for storage.json, probably a new config setting like m_storage_json_path is needed