dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.45k stars 4.76k forks source link

ToBase64String padding is non deterministic #108646

Closed vitaliy-ostapchuk93 closed 1 month ago

vitaliy-ostapchuk93 commented 1 month ago

Description

The Convert.ToBase64String mehtod uses padding to fill the missing bytes for the base64 conversion. This is fine as long we only compare the data itself. The base64 strings however will not always match for equality. My guess is the padding bytes are not reset.

Reproduction Steps

[Test]
[Repeat(100)]
public void TestPrintBase64(){    
    using Image<Rgba32> image = new(1, 1, new Rgba32(255, 0, 0, 0));    
    string base64Img = image.ToBase64String(BmpFormat.Instance);    
    Console.WriteLine(base64Img);
}

Note: execute this in a larger test suite to see the different results. Executing the test standalone will show same printout.

Expected behavior

All of the base64 print out is the same = deterministic behaviour.

e.g. Note the padding suffix.

data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/wA==
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/wA==
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/wA==

...

data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/wA==

Actual behavior

The padding will cause the base64 string to be different sometimes.

e.g. Note the padding suffix.

data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/wA==
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD//w==
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/wA==
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/AA==

...

data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/wA==
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/AA==

Regression?

Not sure.

Known Workarounds

Extend the byte array to match the padding or cut off the base64 sting if comparing for equality. This is a bit strange if using it e.g. to compare base64 encoded images as shown in the example.

Configuration

.NET 8, Windows 11, x64

Other information

No response

huoyaoyuan commented 1 month ago

All the strings are valid base64 strings and are decoded into different values. The decoded length is 58 and the trailing bytes are different. It's not difference in padding, but actual payload.

The tested method belongs to ImageSharp: https://github.com/SixLabors/ImageSharp/blob/5c2812901488bc7e97512e97b8a4aa2629c29185/src/ImageSharp/ImageExtensions.cs#L173-L183

Can you test the following snippet:

var mms = new MemoryStream();
image.Save(mms, BmpFormat.Instance);
var array = mms.ToArray();

and see if the trailing byte of the array changes when you get different ToBase64String? If so, it's ImageSharp to be non-deterministic.

vitaliy-ostapchuk93 commented 1 month ago

@huoyaoyuan can you test the following snippet

data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD//w==
last byte: 255
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/AA==
last byte:  0
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD//w==
last byte:  255
data:image/bmp;base64,Qk06AAAAAAAAADYAAAAoAAAAAQAAAAEAAAABABgAAAAAAAQAAADEDgAAxA4AAAAAAAAAAAAAAAD/EA==
16
...

you are right, the payload changes. so this comes from ImageSharp . if we compare the image buffers though after decode they do match again. quite confusing