WerWolv / ImHex

🔍 A Hex Editor for Reverse Engineers, Programmers and people who value their retinas when working at 3 AM.
https://imhex.werwolv.net
GNU General Public License v2.0
43.02k stars 1.89k forks source link

[Bug] Base64 provider does not work on files with newlines #1838

Open checkraisefold opened 1 month ago

checkraisefold commented 1 month ago

Operating System

Windows

What's the issue you encountered?

The file opens and displays fine. However, attempting to copy paste the data out of the display results in solid 00's. Attempting to run pattern language results in completely nonsensical/invalid results.

How can the issue be reproduced?

Open file with Base64 provider in File -> Open Other. Attempt to copy data out with right click copy, attempt to run pattern language; both do not work correctly.

ImHex Version

1.35.4

ImHex Build Type

Installation type

MSI

Additional context?

test9.txt repro file

checkraisefold commented 1 month ago

At some point in this repro file, the Base64 decoder also seems to go completely haywire and just decode incorrect data.

paxcut commented 1 month ago

I am having trouble decoding the contents of the file provided in online base64 decoders. If you look in the %APPSLOCALDATA%\imhex\tests\patterns\test_data folder you'll find a file called base64.dat (Im attaching it here just in case it is something I added). If I use the contents of that file in the online decoder it decodes it to some json looking text. Loading that file to the base64 provider produces the same text that can be read in the ascii column.

base64.txt

PLEASE NOTE THE EDITED POST CONTENT

to further verify the information above I used an online base64 encoder and typed some text. The produced base64 code loads in Imhex's base64 provider and shows the same string that was used to encode the input file. How did you obtain the sample provided? Can you verify if it is valid base64 encoded file? Please provide details to do so but afaik the base64 provider is not misbehaving at all. If you are going to paste base64 data directly into imhex you cant use the clipboard because that copies ascii, not binary values, but it is not clear what exact operations copy data out implies. Your instructions are also not clear as to what is meant by running pattern language.

I tested running a simple pattern that reads 20 bytes and prints as a string and the decoded bytes are printed correctly. I suspect that your sample is using additional encoding on top of base 64 (like utf-16).afaik base64 provider only supports utf-8 encoding but i need to check that.

checkraisefold commented 1 month ago

The sample TXT uses newlines. In both LF and CRLF, ImHex interprets the new lines as base64, or starts a new base64 string, which causes random bytes and garbage to be output.

https://datatracker.ietf.org/doc/html/rfc4648#section-3.1 https://datatracker.ietf.org/doc/html/rfc4648#section-3.3 ImHex should most likely just error if given an input base64 with non-alphabet characters, OR ignore them entirely