dmwm / dbs2go

DBS server written in Go
MIT License
5 stars 4 forks source link

better sanitize body of PUT requests #103

Open belforte opened 8 months ago

belforte commented 8 months ago

with reference to https://github.com/dmwm/dbs2go/issues/102#issue-1960228988 and https://github.com/dmwm/dbs2go/issues/102#issuecomment-1779545331 I realized that DBS was not hanging, but busy invalidating all files in the DB, see https://cms-talk.web.cern.ch/t/dbs-phys03-files-wrongly-invalidated/31152/1 and references in there.

I re-state the problem here. At least in case of a PUT call to /DBSWriter/files, the body of the put is expected to be in the format "file1,file2,file3,...fileN" i.e. a single string containing a comna-separated list of file names. I was mislead by the terse description in the documentation "a list of file names" and had initially passed a JSON list ["file1","file2","file3",..."fileN"]`

Apparently that triggered DBS server to act on all files !

Proper behavior should be to return an error message (better if clear) and ignore the command.

I have no idea if other API's may be similarly affected by bad formattin of the body of the query.

@todor-ivanov @d-ylee

vkuznet commented 8 months ago

@belforte , it is long battle with DBS codebase to align with proper data-types. The list in JSON ishould be represented as [] and not as comma separated string. The DBS Python code has/had many non-standard data-types, e.g. treat string as integers, etc. In strict languages like Go/C/C++ it is not possible and we should properly align all data-types. Said that, the body should contain a list of files like [file1,file2] because this representation align with Go structs and does not require additional parsing of string object with commas to convert it to list. Said that, it may be that historically Python client sent a list as comma separated list, but it should be fixed to use an actual list.

I'll leave it up to @d-ylee and @todor-ivanov to straight this out and properly align data-types in DBS server and client calls.

belforte commented 8 months ago

I am all for clean and consistent (and documented) interfaces, and have struggled with old DBS python since the tiime it was returning ("%s"%somedictionary) and calling it JSON. But the main problem here is that ["file1","file2"] made the server invalidate every file !

vkuznet commented 8 months ago

@belforte , you should properly define the logic then, e.g. if payload provides list of LFNs (in [] form) should DBS server invalidate all files or not? I'm struggling to understand the issue you have. If client provide a list (either in form of comma separated list or actual list) my understanding that client intention is to invalidate files, is it the case? It would be very useful if you provide concrete example of payload and API call in this case, e.g.

curl http://.../files?is_valid_file=0 -X PUT -H "Content-type: applicaitn/json" -d '{["file1", "file2]}'

In this request I understand the intention of the client and we can work on implementation of what needs to be done. Please translate your request in similar form and use curl and bare HTTP language to provide logic of the request along with payload and what you expect from such request.

belforte commented 8 months ago

sorry I was not clear. The server was not "invaliating all files in the list which I gave", the server invalidated ALL FILES IN THE DB. Yesterday I had to revalidate files in O(500) user datasets which surely I never indicated in my attempts.

This happened around Oct 25 00:30 CET (server local time i.e.) in case someone is willing to look at logs.

On the query format, I defer to Dennis, Todor and possibly you to decide what is better from DBS2Go side. I will happily adapt as soon as I am given an example !

vkuznet commented 8 months ago

@belforte , thanks for clarification. I had a look at phys03 DB and here is what I see:

SQL> select count(file_id) from cms_dbs3_prod_phys03_owner.files;

COUNT(FILE_ID)
--------------
     270777765

SQL> select count(file_id) from cms_dbs3_prod_phys03_owner.files where is_file_valid=0;
COUNT(FILE_ID)
--------------
     269848651

SQL> select count(file_id) from cms_dbs3_prod_phys03_owner.files where is_file_valid=1;

COUNT(FILE_ID)
--------------
        929114

How would you like to deal with this information? May be we can ask CERN DBA to restore a DB backup for this table? If so I suggest you open yet another ticket if any actions are required on DB side. This ticket is only to fix the PUT API logic and settle down on JSON format.

belforte commented 8 months ago

Thanks @vkuznet this ticket is about returning an error message in case the body format is not good, instead of applying the change to all DB records. JSON format is a different story and can be addressed e.g. as part of #102

As to correcting the situation, DBA's usually avoid editing DB for us , and going back to status of 1 week ago would give a big problem about how to find and recover data which was inserted since. I have asked CMS DBS Oracle support people to isssue and SQL query like this (pseudo code) which will fix:

set is_file_valid=1 where last_modified by == "/DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Stefano Belforte [belforte@infn.it](mailto:belforte@infn.it)" and last_modification_date is between Oct24-Oct25 and is_file_valid==0

AFAIU, it should be easy. I can't do this myself via DBS server since there is no API to select files on last_modified_by. Otherwise it would be ten lines of python.

Well... I asked Yuyi, she said that Oracle side has been passed to Dennis, then Dennis said that he's transitioning all his responsibilities on DBS to Todor. :man_shrugging:

In the meanwhile, whenever a user discovers that their favorite dataset has all files invalidated, they can fix via crab setfilestatus --status VALID --dataset <datasetname> which is by now well tested.

belforte commented 8 months ago

if Dennins/Todor like to have a GH issue about that, I will be happy to open. So far only Yuyi replied.

todor-ivanov commented 8 months ago

Hi @belforte ,

Sorry for not replying immediately. I have not yet gotten access to the Database but will make all my best to do so today. Please do create another issue just for restoring the affected oracle Oracle records and will mark it as operational and will get to it with my highest priority.

belforte commented 8 months ago

ciao @todor-ivanov . No worry. I had asked for Oracle-side help to Yuyi who CC-ed Dennis (she did not know about your involvement) and Dennis is now looking at fixing records in DB. I must also say that we go no more requests from users since Monday, I assume that owners of all other datasets are not actively using them, they might later of course, so it is good to reset status in DBS, but not critical. My code in CRABClient is now safe (and only accepting one filename at a time) so this will not happen again, at least not in this context. I can't say about other PUT requests which accepts lists. That's why I opened this issue.

thanks