gaul / s3proxy

Access other storage backends via the S3 API
Apache License 2.0
1.73k stars 223 forks source link

Backblaze B2 sha1 checksum #317

Open kernworks opened 4 years ago

kernworks commented 4 years ago

System: CentOS Linux release 7.7.1908 (Core) s3fs version 1.85 pulled from CentOS epel repo. java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7

Is there a way to use the sha1 checksum functionality of the B2 api using s3fs on top of s3proxy?

I have done a little research and testing but I haven't seem to come up with a way.

This is one of those S3 to B2 quirks that is probably hard to translate.

gaul commented 4 years ago

It is not possible to translate MD5 to SHA1 other than buffering the entire upload in-memory and computing it before the transfer. However, B2 does support a mechanism to send the SHA1 after the transfer which ensures data integrity. This requires modifications to the underlying jclouds library:

https://issues.apache.org/jira/browse/JCLOUDS-1268

kernworks commented 4 years ago

That is an interesting option. Glad to see there's a JClouds ticket out there at least.

Another thought, maybe s3fs could include a hashing algorithm flag with a default of MD5. I know there are other providers that don't use MD5.

It would be better to fix it in JClouds though.

This issue can be closed unless you want to leave it open for reference.

Reference: https://www.backblaze.com/b2/docs/uploading.html

gaul commented 4 years ago

Azure is the other common object store which uses a non-MD5 ETag. I don't know how s3fs would express a non-MD5 ETag since it only supports the S3 protocol. Other FUSE filesystems, notably goofys, support other protocols but not B2. ETag data integrity seems more of a nice-to-have since most applications connect via HTTPS which provides transport integrity. The ETag is seems only useful for end-to-end guarantees where the application checks the ETag that it stores locally.