dmwm / dbs2go

DBS server written in Go
MIT License
5 stars 4 forks source link

Introducing possibility to update `physics_group_name` for a dataset. #87

Open ArturAkh opened 1 year ago

ArturAkh commented 1 year ago

Dear DBS developers,

As discussed in CMSTalk:

https://cms-talk.web.cern.ch/t/moving-cms-embedding-samples-run-2-ul-to-t1s-global-storage-area/10203

It would be good to have the possibility to update physics_group_name for a dataset, at least in prod/phys03

Thank you very much in advance for considering this issue.

Best regards,

Artur Gottmann

vkuznet commented 1 year ago

@d-ylee I hope you can take care of this issue. Here we have two in fact:

The former is trivial to add to dataset update API. The latter will require some thoughts. For that I suggest to introduce phys03 dbs group in CRIC (the service which keep users group/roles and where we already have dbs group). But we need a separate group/role to avoid people writing access to production DBS. The production DBS write access is only granted to dbs group and we should keep it this way, and for phys03 instance we need to have a new group. Said that, the next step would be to add authorization middleware for phys03 dbs group. Once it is done , the update APIs should be allowed for this specific group. Finally, we need to identify people who may be included in this group and update CRIC.

d-ylee commented 1 year ago

@vkuznet Not sure if I am understanding this correctly, but if we update the physics_group_name, would this affect other datasets that also use the same physics group? Or, for the logic to update a dataset's physics_group_name, would it be the following:

  1. Request to update physics_group_name for a dataset via /dataset/ PUT API
  2. Check in PHYSICS_GROUPS table to see if the new physics_group_name is not being used.
  3. If the physics_group_name already exists, get the PHYSICS_GROUP_ID the PHYSICS_GROUPS table and use the value to set the PHYSICS_GROUP_ID in the DATASETS table for that dataset.
  4. If the physics_group_name does not exist, create a new PHYSICS_GROUPS, get the new PHYSICS_GROUP_ID and set this new id to PHYSICS_GROUP_ID for the dataset.
  5. Commit the changes
vkuznet commented 1 year ago

Dennis, all steps are fine except #4. I think we should not allow users to create groups in DBS, I think it should be done by DBS admins similar to data-tiers. The reason is simple, user may provide a data which may contain typo, and it leads to a possibility to create undesired groups which later should be fixed. I would suggest that you implement steps 1,2,3,5 only. If users need a new group we may create it manually in DBS (after confirming with appropriate L2 of the group), and later users may inject data into this group. In other words, we'll allow to update the group but not to create a new one.

d-ylee commented 1 year ago

Valentin, in this case, if a user provides a physics_group_name as a parameter for the /dataset PUT API (UpdateDatasets), then instead of step 4, the PHYSICS_GROUP_NAME should be updated for the currently assigned PHYSICS_GROUP_ID in the dataset? As a result, all datasets that use the same PHYSICS_GROUP_ID will have a renamed PHYSICS_GROUP_NAME.

vkuznet commented 1 year ago

well, I think it is obvious. If user provides physics_group_name we'll get its id and update its id in dataset table. This will basically update the physics_group_name associated with this dataset. But we should not touch all datasets! The update will only happen for that specific dataset.

d-ylee commented 1 year ago

While working on this, I noticed that the DBSPutHandler gives HTTP 500 when an error happens in an update: https://github.com/dmwm/dbs2go/blob/928dc255e5a695546767c363c5d0fb9b37cd6fd8/web/handlers.go#L444-L448

...while for the DBSPostHandler and DBSGetHandler both return HTTP 400: https://github.com/dmwm/dbs2go/blob/928dc255e5a695546767c363c5d0fb9b37cd6fd8/web/handlers.go#L569-L573

Should the DBSPutHandler return HTTP 500? Or should DBSPutHandler also give HTTP 400 on an error?

vkuznet commented 1 year ago

Dennis, good observation. The 500 refers to issue with server, while 400 to issue with request. Since this case we got error from the API it is 400 error rather 500. Please make proper adjustment in a code.

d-ylee commented 1 year ago

@vkuznet I pushed the initial feature to update a dataset via using physics_group_name parameter. Is this what we are looking for? https://github.com/dmwm/dbs2go/commit/156356859156d8b62e8b382dcbe4351aa325b948

d-ylee commented 1 year ago

@vkuznet For the authorization of DBS updates for only phys03, would this only be fore the datasets update, or for all of the updates. In this case, would a custom authentication be required for just the datasets DBSPutHandler case?

vkuznet commented 1 year ago

Dennis, you're mixing different concepts here. The authentication only authenticates users to access a particular resource, while authorization decides what user can do with a resources. Your question is about authorization rather than authentication. As far as I recall the original DBS python code relies on authorization per API basis, and therefore we should use it too, and only authorize update of datasets API with physics_group. You may put in place dedicated authorization for specific API. The authorization should check if user is part of specific group (provided by CMSWEB HTTP headers). For more see cmsauth module and/or CMSWEB HTTP headers, e.g. use httpgo service to display them. The one you are looking for is Cms-Authz-Admin, Cms-Authz-Operator or Cms-Authz-User. The user who will be allowed by authorization should be in some specific group we decide, e.g.group:dbs-phys03-admin. These groups are defined in CMS CRIC service and once we'll create/establish such group we may assign different users to it, and it will allow only users from a specific group to perform certain operation which will be checked by dbs middleware.