Closed stevengoossensB closed 3 years ago
Thank you Steven for the PR. Could you also share with me a few hashes of the samples so I can test it with real instances?
Hi,
Will do.. Unfortunately, I found there's still more functions in the samples which aren't implemented (HLOOKUP, MOD, INT ROUNDUP). I will continue adding these as well today.
Added another set of functions needed for the sample. Still not there though. The HLookup function is still required. Also, the counta function now counts the total number of cells in a range, while it should only count the cells with a value in it.
It is definitely doable as xlrd2 and pyxlsb both support loading worksheets very well. We need to add functions like load_worksheets to the wrappers and then add get_macrosheet_cell function to XLMInterpreter. The reason not to extend the get_cell function is performance (better not to load all the worksheets for all samples, only load for those ones that we need to get some data from their worksheets). I think I may have time to add this part on this Sunday. But if you want to stab on this, please go ahead.
By the way, thank you very much for your contribution
I'll see whether I can familiarize myself with the libraries used and give it a try before Sunday. I'll add all code to this PR for your review.
Thanks for the update. I was trying something along those lines as well. I fixed a bug in the COUNTA method (in some cases, the count is in the macrosheet and the range won't contain the sheetname, so we need to take that into account. Additionally I've added the HLOOKUP method.
Getting closer now and now I do see some Registry strings and file paths in the execution, but still not yet there. Will try to continue tomorrow on this.
I think I found the problem
Š is 138 in ASCII (Latin-1) but 352 in Unicode
So some of the characters are in Latin-1 codepage but ord returns their Unicode equivalent. This cause a problem in decoding some of the characters
https://en.wikipedia.org/wiki/ISO/IEC_8859-1
the undefined ones are mapped in Excel
Correction: the code page is Windows 1252 not Latin-1
@stevengoossensB I thought it is better to merge your branch with the master. However, if you want to wirk on the code more, please continue to do so. I will check them and merge as soon as I can.
I think I might have to for some samples because I get outputs like:
SET.NAME(lbqsnkudzlk,=FORMULA("'"&TEXT(INT(FSIZE(RC)=)+,""),RC)) SET.NAME(lbqsnkudzlk,=FORMULA("'"&TEXT(INT(ISNUMBER(SEARCH("x",RC)))+,""),RC))
Which doesn't look completely right to me.
I have a question regarding https://github.com/DissectMalware/XLMMacroDeobfuscator/pull/63/commits/ab3cbc96f635cd9c3ad99e7699fa64977ef64f6e
What is the purpose of this commit? Concatenation operator was supported before this commit.
The compare operators for <= and >= were not yet added to the lark template. I've added them since I needed it for the analysis of a specific sample.