Currently the g3t command requires the md5 hash of the file to be provided in order to be uploaded to the indexd service. In the case where this hash is not available (i.e. importing files from an existing S3 endpoint) it can take a rather long amount of time to both download the file and calculate it's md5 hash.
New Behavior
Adding support for additional hashes like etag would allow for greater efficiency when uploading files where the md5 hash is not immediately available or not yet calculated.
For remote files already registered in an S3 bucket the etag hash can be fetched with the MinIO client as follows:
Background
Multiple hashes are allowed for the importing of files into the indexd service, including etags:
Current Behavior
Currently the
g3t
command requires the md5 hash of the file to be provided in order to be uploaded to the indexd service. In the case where this hash is not available (i.e. importing files from an existing S3 endpoint) it can take a rather long amount of time to both download the file and calculate it's md5 hash.New Behavior
Adding support for additional hashes like etag would allow for greater efficiency when uploading files where the md5 hash is not immediately available or not yet calculated.
For remote files already registered in an S3 bucket the etag hash can be fetched with the MinIO client as follows:
Steps for Implementing
add --etag option here: https://github.com/ACED-IDP/gen3_util/blob/18d34e4337aae28c7c025457e3c138c37579b9ef/gen3_util/files/cli.py#L73-L74
pass to manifest.put here: https://github.com/ACED-IDP/gen3_util/blob/18d34e4337aae28c7c025457e3c138c37579b9ef/gen3_util/files/manifest.py#L43
verify md5 is not being calculated (it shouldn’t) here: https://github.com/ACED-IDP/gen3_util/blob/18d34e4337aae28c7c025457e3c138c37579b9ef/gen3_util/files/manifest.py#L54
return it as part of the manifest: https://github.com/ACED-IDP/gen3_util/blob/18d34e4337aae28c7c025457e3c138c37579b9ef/gen3_util/files/manifest.py#L73
make indexd hashes conditional: ie md5 or etag not both: https://github.com/ACED-IDP/gen3_util/blob/18d34e4337aae28c7c025457e3c138c37579b9ef/gen3_util/files/manifest.py#L190
Environment