DigitalSlideArchive / import-tracker

A Girder plugin for data import tracking in HistomicsUI
Other
2 stars 2 forks source link

Add /folder/{id}/move API endpoint #19

Closed willdunklin closed 11 months ago

willdunklin commented 1 year ago

This endpoint moves the contents of a folder to a desired assetstore.

Addresses #7

manthey commented 1 year ago

This could be millions of files. We really want to do the move within the iterator so that we never have such a big collection in memory.

It would be good to expose this as a cancellable local job (see exposing the imports as a cancellable job for an example).

manthey commented 1 year ago

Some girder plugins (large_image specifically, but others also use this feature) add internal files for caching data that are associated with a parent resource but not browsable from the UI. These files have fields attachedToId and attachedToType. We should also move these files between assetstores as part of this. This probably means doing the query File().find({'attachedToType': <parent model name>, 'attachedToId': <parent>['_id'], 'assetstoreId': {'$not': <destination assetstore id>}}) and calling move on those, too.

For testing this, if you have the large_image plugin enabled and files that are rendered as image by it, that will create such attached files (the thumbnails stored for quicker response), as will calling histogram endpoints, etc.

manthey commented 12 months ago

If a file is imported, we probably don't want to move it by default. This would mean (possibly as a future PR), adding a flag to "move imported files" and have it default to false. If the file record has an imported: true entry, it would be skipped.

manthey commented 12 months ago

I'm now also wondering if we should do something regarding the created/creatorId -- these get updated, but perhaps they should be maintained from the original file and we should just update the updated date?

manthey commented 12 months ago

I made an issue in Girder for moving attached files: https://github.com/girder/girder/issues/3467.

willdunklin commented 12 months ago

That looks good, I've been able to essentially recreate a version of moveFileToAssetstore for attached files. Using the lower level APIs there also expose control over the file metadata so maintaining the created field is automatically encapsulated (by the upload.update(...) line).

def move_meta_file(file, assetstore):
    parent = Item().findOne({'_id': file['attachedToId']})

    chunk = None
    try:
        for data in File().download(file, headers=False)():
            if chunk is not None:
                chunk += data
            else:
                chunk = data
    except Exception as e:
        return {'error': f'Exception downloading file: {e}'}

    upload = Upload().uploadFromFile(
        obj=io.BytesIO(chunk), size=file['size'], name=file['name'],
        parentType=file['attachedToType'], parent=parent,
        mimeType=file['mimeType'], attachParent=True, assetstore=assetstore)
    upload.update({k: v for k, v in file.items() if k != 'assetstoreId'})
    upload = File().save(upload)

    return upload

This is modeled after the large image plugin's code for manipulating attached files https://github.com/girder/large_image/blob/master/girder/girder_large_image/rest/tiles.py#L1538-L1547

willdunklin commented 11 months ago

I'm now also wondering if we should do something regarding the created/creatorId -- these get updated, but perhaps they should be maintained from the original file and we should just update the updated date?

I made this switch in the companion girder PR https://github.com/girder/girder/pull/3470. Moved files in general now only modify their updated field.