dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

API: changing QoS says "success" but doesn't work (master snapshot 9.0) #6914

Closed onnozweers closed 1 year ago

onnozweers commented 1 year ago

Dear dCache devs,

Changing the QoS of a file through the API seems broken in the master 9.0 snapshot. The API reports back "success" but the QoS of the file is not changed. In our production instance (version 7.2) it works as expected.

Here's what I did:

Getting a macaroon with these properties:

location Optional.empty
identifier dIGgJazo
cid iid:Ru2zIwqu
cid id:31029;31040,40304,30013;onno
cid before:2022-12-07T22:03:45.516158Z
cid home:/users/onno
cid root:/users/onno
cid activity:LIST,DOWNLOAD,UPLOAD,MANAGE,DELETE,READ_METADATA,UPDATE_METADATA
cid ip:145.38.0.0/16,145.100.5.0/27,145.100.5.210/26,145.100.32.0/22,145.100.48.0/23,145.100.50.0/23,145.100.200.0/21,145.100.9.64/29,145.101.32.0/21,145.100.56.0/22,2001:610:108::/48

Listing a file:

11:07 ui.grid.surfsara.nl:/home/onno 
onno$ curl --config /home/onno/.ada/headers/authorization_header_3yL6CRUz2IfM -H 'accept: application/json' --fail --silent --show-error --ipv4 -X GET 'https://dcachetest.grid.surfsara.nl:20443/api/v1/namespace/%2Ftape%2Ftest?locality=true&qos=true'
{
  "fileMimeType" : "application/octet-stream",
  "fileLocality" : "ONLINE_AND_NEARLINE",
  "currentQos" : "tape",
  "labels" : [ ],
  "size" : 10485760,
  "creationTime" : 1669736063762,
  "fileType" : "REGULAR",
  "pnfsId" : "0000EC9AEB2B04304A8D8E34F530B55198F5",
  "nlink" : 1,
  "mtime" : 1669736064395,
  "mode" : 384
}

Changing the QoS returns "success":

11:08 ui.grid.surfsara.nl:/home/onno 
onno$ curl --config /home/onno/.ada/headers/authorization_header_3yL6CRUz2IfM -H 'accept: application/json' --fail --silent --show-error --ipv4 -H 'content-type: application/json' -X POST https://dcachetest.grid.surfsara.nl:20443/api/v1/namespace/%2Ftape%2Ftest -d '{"action":"qos","target":"disk+tape"}'
{"status":"success"}

But the QoS of the file has not changed:

11:09 ui.grid.surfsara.nl:/home/onno 
onno$ curl --config /home/onno/.ada/headers/authorization_header_3yL6CRUz2IfM -H 'accept: application/json' --fail --silent --show-error --ipv4 -X GET 'https://dcachetest.grid.surfsara.nl:20443/api/v1/namespace/%2Ftape%2Ftest?locality=true&qos=true'
{
  "fileMimeType" : "application/octet-stream",
  "fileLocality" : "ONLINE_AND_NEARLINE",
  "currentQos" : "tape",
  "labels" : [ ],
  "size" : 10485760,
  "creationTime" : 1669736063762,
  "fileType" : "REGULAR",
  "pnfsId" : "0000EC9AEB2B04304A8D8E34F530B55198F5",
  "nlink" : 1,
  "mtime" : 1669736064395,
  "mode" : 384
}

This was on our new test server. I tested this also on our old test server dolphin12, running a 9.0 master snapshot too. Same behaviour.

I would expect the API call to succeed in changing the QoS, or else report a failure.

Cheers, Onno

mksahakyan commented 1 year ago

-

alrossi commented 1 year ago

Hi Onno,

I just want to make sure we are all on the right page here. From 7.2 to 8.2 (9.0) there was a major change in how QoS is handled. Before, the QoS transitions were run inside the RESTful frontend. With 8.2. the QoS services now must be there because all QoS operations take place through them.

Knowing you, I imagine you are aware of this. But just in case: https://www.dcache.org/manuals/Book-8.2/config-qos-engine.shtml Are you indeed running the new services? If not, then nothing will happen, though I admit "success" should not be returned in that case.

I will check this out myself to see what the behavior is.

Also, in your case the transition could be quick because the file is already on disk (locality = ONLINE_AND_NEARLINE ) and you are merely making one of its replicas permanent by doing what you did. But if you were staging a file (i.e., tape -> disk+tape, but currently no cached copies exist, i.e., locality = NEARLINE), the only way to report the success of the transition immediately would be to make this call synchronous. With the new service, this has to be asynchronous (since it is scheduled, as Marina noted), so success must mean successfully submitted. currentQos on the other hand is ambiguous and I have to look into this ... since if it is changed immediately it can only mean "this is what we now want", not "this is what the status of the file actually is."

Finally, I wanted to ask what your use case for using the namespace resource to do this is? If you are changing many files, I would suggest you look into the Bulk service which also supports QoS transitions, as well as PIN, UNPIN and WLCG STAGE, RELEASE. Once again, this is an asynchronous model and your client would have to verify the status of the file, but it would probably make more sense to go through bulk submission rather than many multiple REST calls to namespace.

Cheers, Al

alrossi commented 1 year ago

I can confirm that if the QoS services are running, the change of currentQoS is indeed immediate. Hence its interpretation is "this is the QoS status the file should have (from now on...)".

I can also confirm that without the services, you do indeed get a "success" but currentQos remains unchanged.

So I will see about returning failure in that case.

Thanks for pointing this out.

Cheers, Al

onnozweers commented 1 year ago

Hi Al,

Thanks for pointing me to the QoS changes. I had somehow missed them and I will check the documentation and configure the required services. I had read https://www.dcache.org/old/downloads/1.9/release-notes-8.2.shtml but had overlooked the QoS changes mentioned there.

Indeed it would be nice to have some kind of error like "internal server error" in this case, and perhaps a log entry explaining what's wrong.

Cheers, Onno

alrossi commented 1 year ago

This should fix it.

https://rb.dcache.org/r/13810/