dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Ensure correct data-type values in DBS injection requests #10844

Closed vkuznet closed 2 years ago

vkuznet commented 3 years ago

Impact of the new feature I found through DBS unit tests that DBS client allows to pass wrong data-types to DBS server, mostly represent integer as string, e.g. '123' which then is casted to int data-type in DBS server. This is not allowed in Go implementation since it follows strict data-type. The issue here is Python itself which does not enforce strict data-typing. As such a special care should be done on a client side to avoid sending wrong data-type data via HTTP requests.

Is your feature request related to a problem? Please describe. Yes, in Go implementation it is impossible to send wrong data-type via HTTP request

Describe the solution you'd like All WMCore tools should ensure to send proper data-types in DBS injection requests.

Full API documentation can be found at: https://github.com/vkuznet/dbs2go/blob/master/docs/apis.md#post-apis

Describe alternatives you've considered None

Additional context You may refer to DBS#657 which demonstrate the issue and provide a fix for DBS unit test client.

vkuznet commented 3 years ago

Here are full details of the flaw in DBS python server implementation. Let's start with example of using DBS client API:

cat t.py
#!/usr/bin/env python
import json
import uuid
from dbs.apis.dbsClient import DbsApi

uid = uuid.uuid4().time_mid
processing_version="%s" %(uid if (uid<9999) else uid%9999)
url="https://cmsweb-testbed.cern.ch/dbs/int/global/DBSWriter"
api = DbsApi(url=url)
data={'processing_version': processing_version, 'description':'this_is_a_test'}
print(json.dumps(data))
res = api.insertProcessingEra(data)
print(res)

As you can see the code above will use a string value of processing_version. If we'll run this code we'll get the following:

python ./t.py
{"processing_version": "1846", "description": "this_is_a_test"}
None

Now, we can look-up results in DBS via curl call (scurl here refers to curl command with provided grid certs):

# fetch the results
scurl https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader/processingeras?processing_version=1846
[{"description": "this_is_a_test", "create_by": "valya", "creation_date": 1632934742, "processing_version": 1846}]

As you can see from above, even though we provided processing_version as string data-type the DBS client API is happy to inject it into DBS.

Now, let's confirm that it is an issue with DBS server and not DBS client. For that we'll verify that new value for processing era does not exists in DBS and inject it via curl call using JSON with string value of processing version.

# let 's check if value 1848 does not exists in DBS
scurl https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader/processingeras?processing_version=1848
[]

# insert new processing era
scurl -v -X POST -H "Content-type: application/json" -d '{"processing_version": "184"description": "this_is_a_test"}' https://cmsweb-testbed.cern.ch/dbs/int/global/DBSWriter/processingeras
...
< HTTP/1.1 200 OK
...
* Connection #0 to host cmsweb-testbed.cern.ch left intact
null

# and now we can see it again
scurl https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader/processingeras?processing_version=1848
[{"description": "this_is_a_test", "create_by": "valya", "creation_date": 1632935194, "processing_version": 1848}]

With dbs2go implementation it is not possible, see the following error during the injection

scurl -v -X POST -H "Content-type: application/json" -d '{"processing_version": "1849", "description": "this_is_a_test"}' https://cmsweb-testbed.cern.ch/dbs2go-writer/processingeras
...
< HTTP/1.1 400 Bad Request
[{"api":"processingeras","error":"json: cannot unmarshal string into Go struct field ProcessingEras.processing_version of type int64","exception":400,"method":"POST","type":"HTTPError"}]

The reported error says that we can't unmarshal (the marshal/unmarshal names used for encode/decode in Go) the provided JSON because ProcessingEras.processing_version attribute is type of int64.

Using this example I conclude that there are two kind of issues with DBS python implementation:

Therefore, I expect that due to issue one none of the Python clients check data-types of JSON request and may pass incorrectly wrong data-type to DBS server. Since DBS server is casting under the hood values we never saw this problem. But if this is the case and when we move to dbs2go the python client calls may fail if they pass wrong data-type.

vkuznet commented 3 years ago

@yuyiguo , @klannon @KatyEllis @amaltaro @todor-ivanov I provided concrete example (sorry it is lengthy technical one), but it shows the flaw in DBS python server implementation which we should address in all python based clients before we'll move on to use Go implementation of DBS server. The example is based on DBS unit tests, but it affects almost all int based attributes used in JSON requests for data injection into DBS.

There are different ways to resolve this problem, e.g.

vkuznet commented 3 years ago

@belforte this issue affects CRAB as well, therefore please pay attention to it and ensure that CRAB client will take care of proper data-type for injection DBS APIs.

amaltaro commented 3 years ago

@vkuznet instead of extracting what needs to be adapted from your PR. Could you please provide a full list of what those parameters are and what is the expected data type to be provided for them?

I understand it's mostly for write APIs, so if you happen to know what are the exact API names concerned here, it would be quite helpful as well.

vkuznet commented 3 years ago

The APIs in questions are all write APIs we use. These APIs uses JSON payload to send data to DBS, and therefore all data you send should have proper data-types. I already provided full documentation here along with exact structure of JSON payload such that you should not guess and only can have at posted JSON examples over there.

For DBS GET APIs there is no issue since they rely on HTTP URI (http://host:port/path/api?param=value) which is (by definition) does rely on data-types (all parameters send and interpreted as a string).

If you need to know exact data type used by DBS DB I suggest to use DBS DB schema directly. For instance, the dbs2go post apis documentation provides example for /processingeras API

{
    "processing_era_id": 123,
    "processing_version": 12345,
    "creation_date": 1631118749,
    "create_by": "tester",
    "description": "note"
}

which matches DB schema definition for PROCESSING_VERSION as INTEGER, etc.

vkuznet commented 3 years ago

and if you'll pay closer look to dbs2go documentation it also provides pointer to Go code structures used to parse input JSON, e.g. in case of processingeras it points to ProcessingEras where you can find that the ProcessingEra struct has all data-types, in this case:

...
PROCESSING_VERSION int64  `json:"processing_version" validate:"required,number,gt=0"`
...

So, it defines PROCESSING_VERSION as int64 (64 bit integer) which has processing_version JSON key name, it requires validation as number which should be greater than 0. With such strong data-type definition the code protects us from silly mistake and wrong values passed by the client.

amaltaro commented 3 years ago

Okay, we will have to go through all the write APIs and ensure that our application is defining those data types according to what you documented for the DBS2Go server.

By the way, I'm updating the initial description here providing a link to consult this full documentation.

belforte commented 3 years ago

can't we assume that existing code sends correct data until/unless the new server complains ?

vkuznet commented 3 years ago

yes you can, and when Alan provide me the pointer to FwkJobReport code I checked that majority of data are send correctly, e.g. https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/FwkJobReport/Report.py#L811 and other lines indicate that this code correctly cast data-types before producing Fwjr document. But since I don't know all places in WMCore/CRAB code I can't say for sure.

amaltaro commented 2 years ago

@vkuznet as we discussed today, this is mostly related to the write APIs - thus relying on DBSWriter - and given that we have not yet started the Go-based DBSWriter validation (using the default server URI), I am moving it back to the ToDo list.

We still have to review our write APIs and the data type/format of the data we post to the DBS server through the Writer instance. Once we can start testing the Go-based writer instance, we can resume this activity. Please let me know if I missed anything. Thanks!

vkuznet commented 2 years ago

@amaltaro everything is correct. And, I assume that DBSWriter tests are scheduled for Q2. I only want to point out that from server point of view everything is ready for tests and DBSWriter is available on testbed under /dbs2go-writer. Since WMCore team uses testbed almost in production fashion I don't know if you'll be ok to switch default end point /dbs/int/global/DBSWriter to Go server or not. This is why we deployed it under independent end-point for your convenience. In any way, the action item is only validation and it is up to you to decide when to perform. Server wise everything is ready for that.

amaltaro commented 2 years ago

@vkuznet is there anything else that should be considered in this ticket? There is one other piece of WMCore code that writes to DBS, the StepChainParentageFixTask ReqMgr2 CherryPy thread. But that one will be trick to test and we better wait for DBSWriter in testbed to be swapped to the Go implementation. I don't think we need to reopen this ticket only for that.

@belforte Stefano, just a FYI. With all the tests that Valentin did against Go-based DBSWriter, the only issue we had was with processing_version that was defined as a string instead of integer.

vkuznet commented 2 years ago

Alan, I think in terms of development everything is in place. But per our previous discussion it would be good to install new WMAgent with this code in place and re-do integration tests again. Once this will be done we can be ready for migration to new DBS Writer.

amaltaro commented 2 years ago

Sounds good! How about we start discussing a transition of the DBSWriter in cmsweb-testbed then? Even though the patch you made provides database schema change, the DAOs do the right data type casting, so we can easily apply those patches to our "old" testbed WMAgents to move on with further tests.

vkuznet commented 2 years ago

Alan, we're ready to make a switch. For that we will only need to perform change of redirect rules in cmsweb-testbed and point DBSWriter from python based code to Go implementation. I suggest that we discuss this in upcoming WMCore meeting.

amaltaro commented 2 years ago

@vkuznet Valentin, while updating the Oracle preprod password, I happen to find an Oracle unit test that started failing with these changes. We still do not test PRs against Oracle backend, one of the reasons is because it has to be executed in a single node to avoid database concurrent access (create/destroy) and it takes ~1.5h to run.

Anyhow, I managed to separate the daily builds we have on oracle (actually twice a day). Here is the last build without your changes to processing_version: https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-TestOracle/634/ and here is the first build with your changes: https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-TestOracle/635/

so, there is a total of 56 new failures in Oracle-based unit tests. Here is one example: https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-TestOracle/lastBuild/SLICE=0,label=oracle-env/testReport/WMComponent_t.RucioInjector_t.RucioInjectorPoller_t/RucioInjectorPollerTest/testLoadingFiles/

It's likely that those failures share a common root cause, so there is no need to investigate all of the 56 errors. Once you have a new PR meant to fix it, I can fetch your changes in the DMWM-WMCore-TestOracle jenkins job and get it tested.

vkuznet commented 2 years ago

Alan, turns out WMCore has plenty of processingVer defaults pointed to None. I scan WMCore code for them, based on jenkins test failures and I think I identified the final set, unless the default values of processingVer at a different lines. Please include this PR #11059 in jenkins tests and we will see how it is going.