dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 12 forks source link

changing path of a zarr asset does not change original association of uploaded zarr #1106

Open satra opened 2 years ago

satra commented 2 years ago

this is a fairly critical bug since i renamed every single ngff file in the archive. especially since we currently have a one to one mapping between zarr id and asset, this should not have happened. alternatively, it should store multiple associations.

here is the updated asset with the .ome.zarr extension

https://api.dandiarchive.org/api/dandisets/000108/versions/draft/assets/044dfda2-1bef-414a-8e8c-1e5118ffe6e7/

but doing this still returns the old name

curl -X GET "https://api.dandiarchive.org/api/zarr/15662576-2df1-4035-a37e-b9f74fd5cb5b/" -H  "accept: application/json"

response

{
  "name": "sub-MITU01/ses-20210521h17m17s06/micr/sub-MITU01_ses-20210521h17m17s06_sample-178_stain-NN_run-1_chunk-3_SPIM.ngff",
  "dandiset": "000108",
  "zarr_id": "15662576-2df1-4035-a37e-b9f74fd5cb5b",
  "status": "Complete",
  "checksum": "a97bf63ac294436d65e6d86eba657994-37481--14599540180",
  "upload_in_progress": false,
  "file_count": 37481,
  "size": 14599540180
}
jwodder commented 2 years ago

@satra I don't believe the "name" field is required to reflect the asset path. It's just an arbitrary label assigned by the client when the Zarr is created, and currently the client uses the asset path as that label.

satra commented 2 years ago

the name is required when POSTing a zarr, so if i change the name as i did, and try to POST a zarr with the new name, it will create a new zarr id. whereas, if this name was adjusted to reflect the change of path, then it would simply raise an exception that the zarr id exists.

the name is thus important during the creation of a new zarr.

satra commented 2 years ago

i agree that the name field is arbitrary, but i can't change it when i change the path of the associated zarr, which leads to the creation of new zarr if i'm not careful.

waxlamp commented 2 years ago

the name is required when POSTing a zarr, so if i change the name as i did, and try to POST a zarr with the new name, it will create a new zarr id. whereas, if this name was adjusted to reflect the change of path, then it would simply raise an exception that the zarr id exists.

the name is thus important during the creation of a new zarr.

I am a little confused by this. How did you change the name of the zarr, and why was a subsequent POST needed? Indeed, I would expect any POST to create a new zarr--if instead you wanted to change the metadata (or other aspects) of an existing zarr (e.g., one whose name was changed), I would expect that to require a PUT operation instead. (But this is why I'm asking how you were able to change a zarr's name, and what that second POST was for.)

i agree that the name field is arbitrary, but i can't change it when i change the path of the associated zarr, which leads to the creation of new zarr if i'm not careful.

Probably answering my other questions will answer this one, but how does changing the path of a zarr lead to the creation of a new zarr? And above you said you changed the name, but here you are saying you changed the path. Sorry for all the questions 😸 but I want to understand what went wrong here, and how, and what expectations of yours were violated by this situation.

satra commented 2 years ago

(But this is why I'm asking how you were able to change a zarr's name, and what that second POST was for.)

i could not change the name (since PUT is unavailable), hence it created a new zarr.

how does changing the path of a zarr lead to the creation of a new zarr?

locally a file changes path/name. there is now no longer an association between the new path and any zarr object in db, since zarr objects are keyed by dandiset + name. thus the CLI can't find an existing zarr and creates a new zarr object.

this thread is about the life-cycle of zarrs, so let's dig into it:

the solution to this issue would have been to simply add a PUT request such that doing a dandi move can also update the name, but the severity at this point is less because all the zarr blob names were modified in place to be ome.zarr.