Open dhoke4tdb opened 2 years ago
Hi @dhoke4tdb , I can confirm there's a bug in to_base64
implementation.
Size of struct _triple_byte
is 3 bytes. Assigning a pointer of only 1 or 2-byte data to this struct is undefined behavior.
https://github.com/Azure/azure-storage-cpplite/blob/979c59ebe1c247c1126524953c31d75693110ce9/src/base64.cpp#L68
https://github.com/Azure/azure-storage-cpplite/blob/979c59ebe1c247c1126524953c31d75693110ce9/src/base64.cpp#L79
However, cpplite sdk has been deprecated, so I'm afraid we won't fix it. Please use our Track2 SDK instead.
Find attached a zip archive containing files involved in addressing this issue, base64.cpp.patch - a patch gen'd against v0.3.0 of sdk (and applicable to several others that seem to have same exact base64.cpp) verify.base64.cpp.both - a complete file containing both the broken and the fixed implementation that was used attempting to validate that the fix actually did avoid the problem, having the _mod() version succeed followed by the _orig() version failing. base64-patch-related.zip
Thanks @dhoke4tdb , I believe your patch can fix the bug!
One more file, a CI log from our verification attempt with backtrace indicating the _mod routine had succeeded before the _orig routine failed (in the same fashion we have been sporadically encountering) since _mod executed before _orig in the 'both' code. print-stack-raw.log
sdk release 0.3.0, problem general, but periodically manifests as SIGSTOP on azure pipelines CI mac runner, appears because of data that is allocated right up against end of mapped area with nothing mapped beyond
background... hash.cpp/std::string hash(const std::string &to_sign, const std::vector &key)
defines
elsewhere with
and hash() makes call
with base64.cpp containing
Observe that SHA256_DIGEST_LENGTH used to define the memory size ('unsigned char digest[SHA256_DIGEST_LENGTH];') passed into to_base64() is NOT a multiple of 3, but that the -inner- to_base64() with its for() loop incrementing by 3 expects and processes data as if the memory underLying 'input' will be a multiple of 3.
This leads to falling out of the loop with a length remainder of 2.
The generated code, on the mac at least, appears to be under some circumstances attempting to reference a 3rd byte from the beginning of that last entity. When the data is not allocated right at the end of the segment, this is presumably referencing logically undefined memory, and hence logically wrong. If the data of the vector happens to be allocated right at the end of a mapped memory area with NO memory mapped beyond that last address, a SIGSTOP is generated. Following are debugger snippets from the SIGSTOP situation.