DissectMalware / XLMMacroDeobfuscator

Extract and Deobfuscate XLM macros (a.k.a Excel 4.0 Macros)
Apache License 2.0
568 stars 115 forks source link

Added compare operators #63

Closed stevengoossensB closed 3 years ago

stevengoossensB commented 3 years ago

The compare operators for <= and >= were not yet added to the lark template. I've added them since I needed it for the analysis of a specific sample.

DissectMalware commented 3 years ago

Thank you Steven for the PR. Could you also share with me a few hashes of the samples so I can test it with real instances?

stevengoossensB commented 3 years ago

Hi,

Will do.. Unfortunately, I found there's still more functions in the samples which aren't implemented (HLOOKUP, MOD, INT ROUNDUP). I will continue adding these as well today.

https://www.virustotal.com/gui/file-analysis/NjRhZjAxMzkyMTlkODgwZGI4NzZmOTIxODdkZTNkYWY6MTYwNjI4OTMzOQ==/detection

stevengoossensB commented 3 years ago

Added another set of functions needed for the sample. Still not there though. The HLookup function is still required. Also, the counta function now counts the total number of cells in a range, while it should only count the cells with a value in it.

DissectMalware commented 3 years ago

It is definitely doable as xlrd2 and pyxlsb both support loading worksheets very well. We need to add functions like load_worksheets to the wrappers and then add get_macrosheet_cell function to XLMInterpreter. The reason not to extend the get_cell function is performance (better not to load all the worksheets for all samples, only load for those ones that we need to get some data from their worksheets). I think I may have time to add this part on this Sunday. But if you want to stab on this, please go ahead.

By the way, thank you very much for your contribution

stevengoossensB commented 3 years ago

I'll see whether I can familiarize myself with the libraries used and give it a try before Sunday. I'll add all code to this PR for your review.

stevengoossensB commented 3 years ago

Thanks for the update. I was trying something along those lines as well. I fixed a bug in the COUNTA method (in some cases, the count is in the macrosheet and the range won't contain the sheetname, so we need to take that into account. Additionally I've added the HLOOKUP method.

Getting closer now and now I do see some Registry strings and file paths in the execution, but still not yet there. Will try to continue tomorrow on this.

DissectMalware commented 3 years ago

I think I found the problem

Š is 138 in ASCII (Latin-1) but 352 in Unicode

image

image

So some of the characters are in Latin-1 codepage but ord returns their Unicode equivalent. This cause a problem in decoding some of the characters

image

DissectMalware commented 3 years ago

latin-codes.xlsx codes.txt

https://en.wikipedia.org/wiki/ISO/IEC_8859-1

image

the undefined ones are mapped in Excel

image

image

image

DissectMalware commented 3 years ago

I temporary fixed it in https://github.com/DissectMalware/XLMMacroDeobfuscator/pull/63/commits/d12f7df03c2d28e1f12a084bc723936d6d67cc12

DissectMalware commented 3 years ago

Correction: the code page is Windows 1252 not Latin-1

https://en.m.wikipedia.org/wiki/Windows-1252

DissectMalware commented 3 years ago

@stevengoossensB I thought it is better to merge your branch with the master. However, if you want to wirk on the code more, please continue to do so. I will check them and merge as soon as I can.

stevengoossensB commented 3 years ago

I think I might have to for some samples because I get outputs like:

SET.NAME(lbqsnkudzlk,=FORMULA("'"&TEXT(INT(FSIZE(RC)=)+,""),RC)) SET.NAME(lbqsnkudzlk,=FORMULA("'"&TEXT(INT(ISNUMBER(SEARCH("x",RC)))+,""),RC))

Which doesn't look completely right to me.

DissectMalware commented 3 years ago

I have a question regarding https://github.com/DissectMalware/XLMMacroDeobfuscator/pull/63/commits/ab3cbc96f635cd9c3ad99e7699fa64977ef64f6e

What is the purpose of this commit? Concatenation operator was supported before this commit.