VirusTotal / yara-x

A rewrite of YARA in Rust.
https://virustotal.github.io/yara-x/
BSD 3-Clause "New" or "Revised" License
631 stars 50 forks source link

Bug: Base64 encoded values in `yara-x` JSON dump #125

Closed r0ny123 closed 4 months ago

r0ny123 commented 4 months ago

Description: When processing a binary with yr dump -pe <binary> -o json, the values of sections.name, rawData and clearData fields in the JSON output are base64 encoded.

Example JSON Output:

expand ```json { "pe": { "isPe": true, "machine": "MACHINE_AMD64", "subsystem": "SUBSYSTEM_WINDOWS_GUI", "osVersion": { "major": 6, "minor": 0 }, "subsystemVersion": { "major": 6, "minor": 0 }, "imageVersion": { "major": 0, "minor": 0 }, "linkerVersion": { "major": 14, "minor": 0 }, "opthdrMagic": "IMAGE_NT_OPTIONAL_HDR64_MAGIC", "characteristics": 8226, "dllCharacteristics": 352, "timestamp": 1715333013, "imageBase": "6442450944", "checksum": 0, "baseOfCode": 4096, "entryPoint": 12872, "entryPointRaw": 12872, "dllName": "UpdaterTag.dll", "exportTimestamp": 1715333013, "sectionAlignment": 4096, "fileAlignment": 4096, "loaderFlags": 0, "sizeOfOptionalHeader": 240, "sizeOfCode": 39936, "sizeOfInitializedData": 12800, "sizeOfUninitializedData": 0, "sizeOfImage": 69632, "sizeOfHeaders": 1024, "sizeOfStackReserve": "1048576", "sizeOfStackCommit": "1048576", "sizeOfHeapReserve": "1048576", "sizeOfHeapCommit": "4096", "pointerToSymbolTable": 0, "win32VersionValue": 0, "numberOfSymbols": 0, "numberOfRvaAndSizes": 16, "numberOfSections": 5, "numberOfImportedFunctions": "5", "numberOfDelayedImportedFunctions": "0", "numberOfResources": "0", "numberOfVersionInfos": "0", "numberOfImports": "2", "numberOfDelayedImports": "0", "numberOfExports": "4", "numberOfSignatures": "0", "richSignature": { "offset": 128, "length": 56, "key": 3957332653, "rawData": "6XuOuK0a4OutGuDrrRrg63DlK+uoGuDrrRrh66ga4OsaROTqoRrg6xpE4OqsGuDrGkTi6qwa4Os=", "clearData": "RGFuUwAAAAAAAAAAAAAAAN3/ywAFAAAAAAABAAUAAAC3XgQBDAAAALdeAAEBAAAAt14CAQEAAAA=", "tools": [ { "toolid": 203, "version": 65501, "times": 5 }, { "toolid": 1, "version": 0, "times": 5 }, { "toolid": 260, "version": 24247, "times": 12 }, { "toolid": 256, "version": 24247, "times": 1 }, { "toolid": 258, "version": 24247, "times": 1 } ] }, "sections": [ { "name": "LnRleHQ=", "fullName": "LnRleHQ=", "characteristics": 1610612768, "rawDataSize": 40960, "rawDataOffset": 4096, "virtualAddress": 4096, "virtualSize": 40960, "pointerToRelocations": 0, "pointerToLineNumbers": 0, "numberOfRelocations": 0, "numberOfLineNumbers": 0 }, { "name": "LnJkYXRh", "fullName": "LnJkYXRh", "characteristics": 1073741888, "rawDataSize": 4096, "rawDataOffset": 45056, "virtualAddress": 45056, "virtualSize": 4096, "pointerToRelocations": 0, "pointerToLineNumbers": 0, "numberOfRelocations": 0, "numberOfLineNumbers": 0 }, { "name": "LmRhdGE=", "fullName": "LmRhdGE=", "characteristics": 3221225536, "rawDataSize": 12288, "rawDataOffset": 49152, "virtualAddress": 49152, "virtualSize": 12288, "pointerToRelocations": 0, "pointerToLineNumbers": 0, "numberOfRelocations": 0, "numberOfLineNumbers": 0 }, { "name": "LnBkYXRh", "fullName": "LnBkYXRh", "characteristics": 1073741888, "rawDataSize": 4096, "rawDataOffset": 61440, "virtualAddress": 61440, "virtualSize": 4096, "pointerToRelocations": 0, "pointerToLineNumbers": 0, "numberOfRelocations": 0, "numberOfLineNumbers": 0 }, { "name": "LnJlbG9j", "fullName": "LnJlbG9j", "characteristics": 1107296320, "rawDataSize": 4096, "rawDataOffset": 65536, "virtualAddress": 65536, "virtualSize": 4096, "pointerToRelocations": 0, "pointerToLineNumbers": 0, "numberOfRelocations": 0, "numberOfLineNumbers": 0 } ], "dataDirectories": [ { "virtualAddress": 46064, "size": 120 }, { "virtualAddress": 46184, "size": 60 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 61440, "size": 1332 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 65536, "size": 12 }, { "virtualAddress": 45200, "size": 28 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 45056, "size": 64 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 0, "size": 0 }, { "virtualAddress": 0, "size": 0 } ], "importDetails": [ { "libraryName": "KERNEL32.dll", "numberOfFunctions": "3", "functions": [ { "name": "PeekNamedPipe", "rva": 45056 }, { "name": "GetLastError", "rva": 45064 }, { "name": "CreateMutexW", "rva": 45072 } ] }, { "libraryName": "USER32.dll", "numberOfFunctions": "2", "functions": [ { "name": "MessageBeep", "rva": 45088 }, { "name": "MessageBoxA", "rva": 45096 } ] } ], "exportDetails": [ { "name": "extra", "ordinal": 1, "rva": 12984, "offset": 12984 }, { "name": "follower", "ordinal": 2, "rva": 12984, "offset": 12984 }, { "name": "run", "ordinal": 3, "rva": 12984, "offset": 12984 }, { "name": "scub", "ordinal": 4, "rva": 12984, "offset": 12984 } ], "isSigned": false, "overlay": { "offset": "69632", "size": "4096" } } } ```
latonis commented 4 months ago

This occurs with any field that has a type of bytes. JSON only supports UTF-8 via spec so it's base64 encoded as the bytes are not guaranteed to be UTF-8. 🙂

plusvic commented 4 months ago

As @latonis said, this is the intended behaviour for fields of type bytes, because JSON doesn't support strings that are not UTF-8.

There's a separate discussion about whether certain fields (like section names) should be bytes or string, in one hand bytes is able to accommodate section names that contains strange characters and are not valid UTF-8, in the other hand string is more easy to work with because it won't get encoded as base64.

r0ny123 commented 4 months ago

Closing this as previously agreed upon internally.