dmwm / das2go

Go implementation of Data Aggregation System (DAS) for CMS experiment
MIT License
2 stars 3 forks source link

DASGoServer using WMCore couch views deprecated long ago #27

Closed amaltaro closed 3 years ago

amaltaro commented 4 years ago

@vkuznet Valentin, please let me know if this issue should get created somewhere else.

As mentioned in the WMCore meeting today, while scanning the CMSWEB frontend logs, I noticed the following 3 calls:

IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByOutputDataset?key=\"/SingleMuon/Run2018D-SiPixelCalSingleMuon-ForPixelALCARECO_UL2018-v1/ALCARECO\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10719 in 15439 out 52 body 7227 us ] [auth: TLS... "DN" "-" ] [ref: "-" "dasgoserver" ]

IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByInputDataset?key=\"/SingleMuon/Run2018D-SiPixelCalSingleMuon-ForPixelALCARECO_UL2018-v1/ALCARECO\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10718 in 15439 out 52 body 8498 us ] [auth: TLSv... "DN" "-" ] [ref: "-" "dasgoserver" ]

IP "GET /couchdb/wmstats/_design/WMStats/_view/requestByOutputDataset?key=\"/store/data/Run2018C/EGamma/RAW/v1/000/319/349/00000/F01E03C9-1683-E811-A262-FA163E5A6AC2.root\"&include_docs=true&stale=update_after HTTP/1.1" 404 [data: 10736 in 15439 out 52 body 5896 us ] [auth: TLSv... "DN" "-" ] [ref: "-" "dasgoserver" ]

Searching for these couch views in WMCore, they have been deprecated 5 years ago (!): https://github.com/dmwm/WMCore/pull/5609

Can you please update them as follows:

In addition to that, can you please clarify which kind of request information you need? Is it just the workflow name? Or you need some other workflow meta-data. If it's the former, then please do always use detail=False.

Last but not least, if you check the 3rd example, there is no data sanitization on DAS Server, which means, it will make a reqmgr2 call to whatever data input is provided by the user. In this case, it asks for workflows by output dataset, but it provides a LFN.

An extra request, would you have a map of all WMCore APIs used within DAS Server/client? We might have other such cases that were not spotted yet. Thanks

vkuznet commented 4 years ago

The code in question is here: https://github.com/dmwm/das2go/blob/master/services/reqmgr.go#L151 and was suggested by Seangchan. The code needs to find reqMgr ids from given dataset name. They are required to search for proper config , see https://github.com/dmwm/das2go/blob/master/services/reqmgr.go#L236

So, since I queried various couchdb views for input and output datasets I need to know the replacement for all of them via reqmgr2 APIs. Please see all views I used: https://github.com/dmwm/das2go/blob/master/services/reqmgr.go#L151-L157 and provide me a replacement for all of them. Then I can change the code. Since the views return different data-structure then reqmgr2 the code should be adjusted accordingly.

Regarding sanitation, please open up separate request and I'll work on it, it looks like I missed this check for config API.

Regarding map of WMCore APIs, the das follows DAS maps for each service. The maps for reqmgr2 are defined here: https://github.com/dmwm/das2go/blob/master/maps/reqmgr2.yml and I only rely on /data/request API for reqmgr2 For configs I extract them from couchdb but this is done with code rather then with map due to non-direct logic involved if you have API replacement you need to tell me it.

amaltaro commented 4 years ago

Valentin, I would suggest you to completely remove these 2 urls: https://github.com/dmwm/das2go/blob/master/services/reqmgr.go#L155-L158

and replace the ReqMgr2 couch views here: https://github.com/dmwm/das2go/blob/master/services/reqmgr.go#L151-L154 by the two REST API I provided above (/reqmgr2/data/request?....)

IF, you only need the workflow id, then please add detail=False to the query string. Otherwise - and I see you look up for the ConfigCacheID - you need to fetch the whole document, not only the request id.

They are required to search for proper config , see https://github.com/dmwm/das2go/blob/master/services/reqmgr.go#L236

For these config cache id, we do not have a REST API yet. So you can keep that code around.

Here is the sanitization issue: https://github.com/dmwm/das2go/issues/28

vkuznet commented 4 years ago

Alan, I don't need workflow Id since it is not used to look-up configuration in couch. The couch call for configuration relies on ConfigCacheID, e.g./couchdb/reqmgr_config_cache/<ConfigCacheID>/configFile, therefore there is no way for me to use detail=false. But obviously fetching the entire doc just to get a proper value is overkill. Therefore I suggest that you implement proper reqmgr2 API to return ConfigCacheID for given dataset. If I'll have this API I can remove the entire logic of parsing reqmgr/couch docs.

amaltaro commented 4 years ago

I'm afraid I won't be able to work on a new API in the coming couple of months. I'd suggest to proceed with the changes we have discussed so far, and in the future we can come up with a proper REST API to retrieve only the ConfigCacheID.

Note that the way those views are currently queried (include_docs=true), you are anyways already fetching the whole documents.

vkuznet commented 4 years ago

I fixed both issues reported here and in #28 in DAS code, and I also opened up feature request in DMWM/WMCore to provide proper API for that, see https://github.com/dmwm/WMCore/issues/9699

The changes will be scheduled into next cmsweb upgrade once they pass validation.

amaltaro commented 4 years ago

Thanks, Valentin