Wrong Base64 encoding/decoding in certain cases

RovoMe commented 4 years ago

On loading an XML file that has an embedded PDF file as base64 encoded string, this plugin might produce wrong results.

Consider this PDF as example, which contains a simple data table and nothing more. While Notepad++ is fine on converting the PDF or its base64 string representation back and forth, VSCode and/or this plugin produce wrong results. The original PDF is 66 KB while the PDF after conversion is 110 KB in size.

I assume this is an issue of the Buffer class itself as I already tried a similar approach before leading to the same issues. It seems that the string is stored as uft-16 instead of utf-8 somehow, which would explain the almost doubling in size, I guess.

paulverbeke commented 1 year ago

I just came accross the same issue with a PDF stored as Base64. Cannot decode a base64 pdf at all, the resulting PDF is corrupted

rocketrogerdcook commented 5 months ago

I was in text editor mode and copy/pasted in a base64 encoded string that was an encoded JPG file. When decoded then reopened in Hex Editor mode, I was expecting it to start with FF D8 FF, but instead, it started with several instances of EF BF BD, the REPLACEMENT CHARACTER as encoded in UTF-8.

Manual decoding resulted in the expected FF D8 FF at the beginning of the file, so this appears to be a bug in decoding unprintable characters while in text editor mode.

To reproduce:

Copy this text snippet into an editor window in VS Code with the base64 extension installed (it is a 1x1 JPG file):

/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAMCAgICAgMCAgIDAwMDBAYEBAQEBAgGBgUGCQgKCgkICQkKDA8MCgsOCwkJDRENDg8QEBEQCgwSExIQEw8QEBD/2wBDAQMDAwQDBAgEBAgQCwkLEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBD/wAARCAABAAEDAREAAhEBAxEB/8QAFAABAAAAAAAAAAAAAAAAAAAABf/EABQQAQAAAAAAAAAAAAAAAAAAAAD/xAAVAQEBAAAAAAAAAAAAAAAAAAAFBv/EABQRAQAAAAAAAAAAAAAAAAAAAAD/2gAMAwEAAhEDEQA/AEwiEf/Z

Select all, and run the Base64 Decode command.
Select all, and run the Base64 Encode command.

Expected result:

The encoded text should be equivalent to the text entered earlier.

Actual result:

The encoded text is different from that which was entered. Decoding the newly-encoded text yields a binary result that begins with four instances of the "REPLACEMENT CHARACTER" UTF-8 sequence.

Other notes:

About Visual Studio Code: Version: 1.89.0 (Universal) Commit: b58957e67ee1e712cebf466b995adf4c5307b2bd Date: 2024-05-01T02:10:10.196Z Electron: 28.2.8 ElectronBuildId: 27744544 Chromium: 120.0.6099.291 Node.js: 18.18.2 V8: 12.0.267.19-electron.0 OS: Darwin x64 23.4.0

Extension version 0.1.0.

adamhartford / vscode-base64