Closed Rameshkubendran closed 4 years ago
If I understand correctly, you are trying to:
BlobOutputStream
stageBlock()
/commitBlockList()
Is this correct?
I am not uploading entire file in a single shot. since its a big file I am uploading as block/chunk. I will ask my question here in different way ...
Issue : content-md5 value is missing in azure blob property when we are uploading a file as block/chunk . (not in single upload or shot). So, how we can resolve this issue ?
Code Reference is here... while (contentInputStream.available() > 100 1024 1024) { blockIdEncoded = Base64.getEncoder().encodeToString(String.format("%05d", blockNum).getBytes(Charset.forName(ENCODING_TYPE))); fileInBlob.uploadBlock(blockIdEncoded, contentInputStream, 100 1024 1024, accessCondition, null, null); blockList.add(new BlockEntry(blockIdEncoded)); blockNum++; } blockIdEncoded = Base64.getEncoder().encodeToString(String.format("%05d", blockNum).getBytes(Charset.forName(ENCODING_TYPE))); fileInBlob.uploadBlock(blockIdEncoded, contentInputStream, contentInputStream.available(), accessCondition, null, null); blockList.add(new BlockEntry(blockIdEncoded)); fileInBlob.commitBlockList(blockList, accessCondition, null, null);
Apologies for the delay.
Content md5 is only stored by the service, and you cannot get it to calculate the md5 for you*. Your option one was the correct approach: to calculate the md5 locally and set the property.
In regards to out of memory exceptions when the file is large enough, you do not need the entire file in memory to calculate its md5. The MessageDigest class was designed to consume arbitrary amounts of data in its computation. If you look at the javadocs, you can see a code example where they call update()
multiple times to produce a single md5, and can do so arbitrarily many times. They also attempt to show off the clone functionality in this sample to demonstrate clone functionality available on certain hash algorithms, but you do not need that for this use case.
Does this resolve your issue?
*If your blob is beneath a certain threshold in size, then the service will allow this on single-shot uploads. I believe this number is in the tens of megabytes.
Thank you. I am generating md5 explicitly (locally) and setting in to blob property. it's working as expected.
Regarding OOM I am using DigestInputStream instead dealing with input stream directly. It's working fine..
I am going to close this issue as it seems all discussed issues have bee resolved. @Rameshkubendran please feel free to comment here further or open another issue if you need further support.
OK I put a write up of what I believe is "what is possible" with azure md5 checking here, in case useful for followers https://stackoverflow.com/a/69319211/32453
Which service(blob, file, queue, table) does this issue concern?
There is an issue in blob service.
Which version of the SDK was used?
We are using JAVA 8.
Please note that if your issue is with v11, we are recommending customers either move back to v11 or move to v12 (currently in preview) if at all possible. Hopefully this resolves your issue, but if there is some reason why moving away from v11 is not possible at this time, please do continue to ask your question and we will do our best to support you. The README for this SDK has been updated to point to more information on why we have made this decision.
What problem was encountered ?
CONTENT-MD5 is missing in azure portal when we are uploading big file in blob as block/chunk. It looks like Azure is not updating CONTENT-MD5 by default for block upload as single upload.
Since content-md5 is not updating we are getting an exception "Blob has mismatch (integrity check failed), Expected value is m5hM3x8grCYBgNAue/RYnA==, retrieved CMWQgUAgrLKtUYC3VLD+hw== " when are downloading /reading content from blob as we need to validate the content integrity while downloading.
Other details are here... Version : Azure-storage 7.0.0 .
Language : Java 8
Have you found a mitigation/solution?
Option 1: We are generating the content MD5 for entire file from our end and setting in to blob property before uploading a file. Its working as expected. But are getting an Out of memory issue when we are uploading big file size.
Md-5 Code Snap Shot: // blobContentInputStream is inputStream byte [] blobContentBytes = IOUtils.toByteArray(blobContentInputStream);
//Generating MD5 of the blob content. MessageDigest md = MessageDigest.getInstance(“MD5”); md.reset(); md.update(blobContentBytes); // Encode the md5 content using Base64 encoding String base64EncodedMD5content = Base64.encode(md.digest()); // set blob properties and assign md5 content fileInBlob.getProperties().setContentMD5(base64EncodedMD5content); // fileInBlob is CloudBlockBlob object
Option 2: To make azure to calculate and update the content MD-5 internally , we tried to enable the following property StoreBlobContentMD5 and UseTransactionalMD5 in BlobRequestOptions which is not working for us.
Approach 1: BlobRequestOptions b = new BlobRequestOptions(); b.setStoreBlobContentMD5(true); b.setUseTransactionalContentMD5(true); fileInBlob.uploadBlock(blockIdEncoded, contentInputStream, contentInputStream.available(), accessCondition, b, null); // fileInBlob is CloudBlockBlob object
Approach 2: Disabling UseTransactionalContentMD5 is False BlobRequestOptions b = new BlobRequestOptions(); b.setStoreBlobContentMD5(true); b.setUseTransactionalContentMD5(false); fileInBlob.uploadBlock(blockIdEncoded, contentInputStream, contentInputStream.available(), accessCondition, b, null); // fileInBlob is CloudBlockBlob object
Clarification: How to make azure to calculate Content-MD5 internally while uploading as a block for big file size similar single upload ? Note: We are good if we are able to generating MD5 for whole file as we are going to validate whole file while downloading, instead of each block.
Thanks, Ramesh Kubendran