Closed yarikoptic closed 2 years ago
@yarikoptic The script is already supposed to align assets.json
and the set of local files with what's on the server. I don't know why that's not happening (the fact that assets.json
wasn't updated implies the problem wasn't with datalad save
breaking up the command), and unless you're proposing a different method of operation, a "force consistency" option wouldn't make a difference.
@yarikoptic Let me describe how the asset backup works (ignoring the behavior around handling of published versions):
.dandi/assets.json
(containing all the asset metadata from a previous backup) is read into a variable named asset_metadata
, and the list of assets in the local dataset is stored in a variable named local_assets
.asset_metadata
, the asset in the dataset is updated, including replacing the metadata in asset_metadata
. Either way, the asset path is removed from local_assets
.local_assets
(i.e., those that exist in the local dataset but not on the server) is deleted from the dataset and removed from asset_metadata
. Then, everything still in asset_metadata
is dumped to .dandi/assets.json
.Is there any part of this procedure that you want to change?
I don't know why rawdata/
wasn't removed automatically when it should have been, but since you (I assume) manually deleted the folder, the assets currently won't get deleted from .dandi/assets.json
without manual editing.
To the 2nd step (retrieves each asset from the API) add collect paths of remote assets into "remote_paths".
Then before Then, everything still in asset_metadata is dumped to .dandi/assets.json.
in the last step add remove from asset_metadata any asset path of which is not present in remote_paths
.
Commit message for such a change (if any asset is removed from asset_metadata) should include number of such assets which were "garbage collected from local listing" or smth like that
Follow up to https://github.com/dandi/dandisets/issues/230#issuecomment-1194885177 . We should be able to ensure complete check/fix up (add/remove files and ensure up-to-date
.dandi/assets.json
) without shortcuts: should get a list of assets, ensure that they (and only they) are available locally, save current.dandi/assets.json
and dandi.yaml. ATM update of 000026 where we manually reflected removal of some assets viagit rm && git commit --amend
since prior run failed torm
, if we runand still have them listed in assets.json
I am not quite sure why the run didn't detect that it needs saving an updated assets.json (may be some optimization or may be some bug to fix) but I think we would benefit from an option to ensure that we have any dandiset we process in consistent - matching the server state.