Closed willdunklin closed 11 months ago
This could be millions of files. We really want to do the move within the iterator so that we never have such a big collection in memory.
It would be good to expose this as a cancellable local job (see exposing the imports as a cancellable job for an example).
Some girder plugins (large_image specifically, but others also use this feature) add internal files for caching data that are associated with a parent resource but not browsable from the UI. These files have fields attachedToId
and attachedToType
. We should also move these files between assetstores as part of this. This probably means doing the query File().find({'attachedToType': <parent model name>, 'attachedToId': <parent>['_id'], 'assetstoreId': {'$not': <destination assetstore id>}})
and calling move on those, too.
For testing this, if you have the large_image plugin enabled and files that are rendered as image by it, that will create such attached files (the thumbnails stored for quicker response), as will calling histogram endpoints, etc.
If a file is imported, we probably don't want to move it by default. This would mean (possibly as a future PR), adding a flag to "move imported files" and have it default to false. If the file record has an imported: true
entry, it would be skipped.
I'm now also wondering if we should do something regarding the created/creatorId -- these get updated, but perhaps they should be maintained from the original file and we should just update the updated
date?
I made an issue in Girder for moving attached files: https://github.com/girder/girder/issues/3467.
That looks good, I've been able to essentially recreate a version of moveFileToAssetstore
for attached files. Using the lower level APIs there also expose control over the file metadata so maintaining the created
field is automatically encapsulated (by the upload.update(...)
line).
def move_meta_file(file, assetstore):
parent = Item().findOne({'_id': file['attachedToId']})
chunk = None
try:
for data in File().download(file, headers=False)():
if chunk is not None:
chunk += data
else:
chunk = data
except Exception as e:
return {'error': f'Exception downloading file: {e}'}
upload = Upload().uploadFromFile(
obj=io.BytesIO(chunk), size=file['size'], name=file['name'],
parentType=file['attachedToType'], parent=parent,
mimeType=file['mimeType'], attachParent=True, assetstore=assetstore)
upload.update({k: v for k, v in file.items() if k != 'assetstoreId'})
upload = File().save(upload)
return upload
This is modeled after the large image plugin's code for manipulating attached files https://github.com/girder/large_image/blob/master/girder/girder_large_image/rest/tiles.py#L1538-L1547
I'm now also wondering if we should do something regarding the created/creatorId -- these get updated, but perhaps they should be maintained from the original file and we should just update the
updated
date?
I made this switch in the companion girder PR https://github.com/girder/girder/pull/3470. Moved files in general now only modify their updated field.
This endpoint moves the contents of a folder to a desired assetstore.
Addresses #7