dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 12 forks source link

asset path metadata update not removing old asset from dandiset #1104

Closed satra closed 2 years ago

satra commented 2 years ago

from the discussion in https://github.com/dandisets/000026/issues/3, it seems that old paths remain when the path metadata is modified in the following way. it may be worthwhile adding a test in this library if it doesn't exist, which verifies that a path update removes an old asset or returns an error. perhaps a path change specifically could be tested in the code.

as far as i know this would work if the end point requested for an asset is not the global endpoint, but the dandiset specific one.

from dandi.dandiapi import DandiAPIClient
from dandischema.models import BioSample, SampleType, Asset
from dandischema import validate

from pathlib import Path

api = DandiAPIClient("https://api.dandiarchive.org/api", token=<TOKEN>)
ds = api.get_dandiset("000026")

for asset in ds.get_assets():
    if not asset.path.startswith('rawdata/sub-I46/ses-SPIM'):
        continue
    oldpath = Path(asset.path)
    newpath = Path('rawdata/sub-I46/ses-SPIM/micr') / oldpath.name
    if newpath.suffix == '.tiff':
        newpath = newpath.with_suffix('.tif')
    meta = asset.get_metadata()
    meta.path = str(newpath)
    validate(meta.json_dict(), json_validation=True)
    asset.set_metadata(meta)
jwodder commented 2 years ago

@satra This seems to be more of a problem with the API than with the client.

yarikoptic commented 2 years ago

hm... we might want to get to design doc(s) on how we envisioned COPY-vs-RENAME difference to be expressed. We did envision "cheap copy" though (well, at least into other dandisets) and that was the whole motivation for COW operation. If we did not envision explicit "MOVE" (AKA "RENAME") semantic, it might as well be for a client to DELETE the original asset

satra commented 2 years ago

@gmazzamuto: both of the below works for me on staging.

the api server (at least staging), behaves as expected. after doing this the old asset is no longer on the api-server.

headers = {
        "Accept": "application/json",
        "Content-Type": "application/json",
        "Authorization": f"token {token}"
        }
apibase = "https://api-staging.dandiarchive.org/"
dandiset = "000006"
apibase = "https://api-staging.dandiarchive.org"
url = f"{apibase}/api/dandisets/{dandiset}/versions/draft/assets/"
asseturl = url + "d12438bb-373c-4562-872d-005278314e3d/"

import requests
meta = requests.get(asseturl).json()
meta["path"] = meta["path"].replace("ses-20170309", "ses-20170309_ephys")
req = requests.put(asseturl, headers=headers, json={"metadata": meta, "blob_id": "ca599d42-6da0-4665-94de-eab7b12f112c"})

and this worked as well (using dandi library version: 0.39.4):

from dandi.dandiapi import DandiAPIClient
from dandischema.models import BioSample, SampleType, Asset
from dandischema import validate

from pathlib import Path

api = DandiAPIClient("https://api-staging.dandiarchive.org/api", token=token)
ds = api.get_dandiset("000006")
asset = ds.get_asset_by_path("sub-anm369962/sub-anm369962_ses-20170310.nwb")
meta = asset.get_metadata()
meta.path = meta.path.replace("ses-20170310", "ses-20170310_ephys")
asset.set_metadata(meta)
yarikoptic commented 2 years ago

FWIW, I have tried with

the following tuned up example on the main instance on 000029 ```shell # snippet from https://github.com/dandi/dandi-archive/issues/1104 # but adjusted for using main archive from dandi.dandiapi import DandiAPIClient from dandischema.models import BioSample, SampleType, Asset from dandischema import validate from pathlib import Path api = DandiAPIClient() api.dandi_authenticate() ds = api.get_dandiset("000029", "draft") asset = ds.get_asset_by_path("sub-RAT123/sub-RAT123.nwb") meta = asset.get_metadata() meta.path = meta.path.replace(".nwb", "_.nwb") asset.set_metadata(meta) ```

and https://dandiarchive.org/dandiset/000029/draft/files?location=sub-RAT123%2F shows only the _.nwb one - so on the main instance it also works as desired (in that it did move). I haven't done further expedition to discover why behavior differed in https://github.com/dandisets/000026/issues/3 but we did update deployed version since then.

satra commented 2 years ago

let's close for now, and keep an eye on it.

yarikoptic commented 2 years ago

FWIW, this is the dandi-archive code which does all that presumably and logic looks ok to me -- make a new asset, delete old : https://github.com/dandi/dandi-archive/blob/HEAD/dandiapi/api/views/asset.py#L461

yarikoptic commented 2 years ago

FWIW, may be somehow an asset generated in https://github.com/dandi/dandi-archive/blob/HEAD/dandiapi/api/views/asset.py#L439 (new_asset = self.asset_from_request()) was getting saved without an explicit new_asset.save? ... anyways -- I will stop digging on this

satra commented 2 years ago

also i just renamed about 3000 files in 108 and that worked too i think.