Open KiranBabu-Kirando opened 6 years ago
@KiranBabu-Kirando , There are two places using MD5 in DMLib: verify each chunk's data integrity with transactional MD5 with each request, and verify the whole blob's data integrity with ContentMD5 in blob's property. For the former one, DMLib and azure storage server both calculates the chunk's MD5 and compares the two value. Storage server only support MD5 for now. The latter one, for now, DMLib only supports MD5, we don't have plan to change it to SHA256 yet.
Could you share more details about your concern? Are you worried about the MD5 in request would be intercepted by others or anything else? Thanks Emma
Our concern is that SHA256 is much strong hashing than MD5. Our security team mandates us to use SHA256 for any kind of file transfer. Since we have to download and upload blobs we have to use SHA256 rather than MD5. My ask is that the whole data integrity hashing can be made dynamic so that MD5 can be default but user can supply their own hash.
MD5 is used for integrity, not security, so the “strength” of the hash is not a factor. We may move to a different algorithm for integrity checking, but stronger hashing means slower performance, so it’s not likely to be SHA.
I agree SHA256 is slower compared to MD5, but not that slower. Allowing the end user to use their own hashing for integrity and security would help a lot. Making MD5 as default as it comes along with DML and provide a delegate so that end users can do their own hashing. When users do their own hashing then its implied that they are responsible for any kind of performance degradation that may be caused by hashing algorithms they choose.
Thanks for providing feature requests, we are evaluating this feature request. And welcome further discussion!
Any update on this? https://docs.microsoft.com/en-us/visualstudio/code-quality/ca5351?view=vs-2019
Hi, @madhu-sameena
Thanks for reaching us!
For storage, SHA256 is used for authN/authZ when using SharedKey or SAS. At same time, https is typically suggested to be used to ensure the security.
MD5 is used only for data integration validation which is behind authN/authZ mechanism like SharedKey, SAS or Bearer token(OAuth), and it is not related to security.
So regarding the link, collision attack is not applicable in storage's scenario. As AuthN/AuthZ protects system from being compromised, and MD5 is not related to this process.
For hashing customizing, @EmmaZhu for more information.
Best Regards, Jiachen
@madhu-sameena
DMLib only uses MD5 for data integrity, which is to calculate MD5 for blob/file content and compare it with the one get from Azure Storage.
Currently, Azure Storage supports two ways to guarantee data integrity: MD5 and CRC64. Without Azure Storage server support, DMLib won't be able to use SHA256 for it. We won't add SHA256 for the data integrity, but we may support CRC64 in the feature.
Thanks Emma
@jiacfan @EmmaZhu - Thank you for the response.
So we wait for CRC64 support and until then will use MD5.
Is there any movement on an MD5 alternative? We're looking to interact with Azure Blob Storage from a Blazor app, and .NET 5 doesn't have a managed implementation of MD5 - so we can't calculate a client-side hash to compare.
I'd like to add, FIPS managed systems throw exceptions trying to even compute MD5 in .net, so file integrity validation using MD5 doesn't work - it is not possible to validate using MD5 on FIPS systems.
I would love to see at least an option to enable sha256.
Fewer and fewer things are storing md5 (e.g. imagine syncing data between two different systems based on digest). Right now in order to do this I'm first doing a batch-upload of a bunch of blobs, getting the results of that then calculating the digests of each of the blobs (from the local copy) and 1-by-1 updating the metadata of each blob that was uploaded to include a sha256 digest.
its 2024 already and the suggested recommendation from Microsoft is stop using MD5 in their code base. is this issue getting prioritized at Storage team ? can we expect an alternative to MD5 ? many internal teams rely on integrity validation checks by utilizing ContentMD5 hash on storage SDK's.
Is there any update on any timeline to support alternative checksum validation algo ?
+1 SHA256 should be standard at this point.
Due to our security requirement, our service we cannot use MD5 hashing. I would like to have SHA256 as part of the blob upload and download transfer. I see a lot of benefit where users can get out of this as SHA256 is much stronger than MD5. Is this something that can be implemented ?