bontchev / pcodedmp

A VBA p-code disassembler
GNU General Public License v3.0
450 stars 85 forks source link

Cross comparison with Open Office #13

Closed sancarn closed 4 years ago

sancarn commented 4 years ago

@bontchev I saw this last night and was amazed by the work you have done already. I noticed you still had some "known issues". I wondered are these issues also known to be issues with Open Office's VBA implementation?

Perhaps Open Office have a better version and their source code: https://github.com/apache/openoffice/tree/273865e5126901b006a2c544dc73456b0510afee/main/oox/source/ole

could help determine the source of these issues and/or could be ported to Python...

Just a thought :)

decalage2 commented 4 years ago

I think the most recent version of the code is in LibreOffice here: https://github.com/LibreOffice/core/tree/master/oox/source/ole

However, I skimmed through the code and only saw support for the VBA source code. Have you seen anything related to P-code in there?

sancarn commented 4 years ago

@decalage2 Admittedly I've never used open office, so I don't know the ins and outs of how it works or what it does. However that said, I suspect if it has any VBA support, it is able to read and write PCode to the OXML format, which is largely what this project tries to achieve also.

But yes, need to actually download and look at what support it does have.

sancarn commented 4 years ago

As far as https://wiki.openoffice.org/wiki/VBA_interoperability_in_OpenOffice is concerned, it appears at least OpenOffie can run and import VBA:

From OpenOffice 3.0 to Apache OpenOffice 3.4, there are many improvements in VBA interoperability, such as import mechanism, API support and event supports in userform and controls. We will classify VBA interoperability from three aspects:

**Import**

Now we can import MS Excel 2003 and MS Excel 2007(xlsm/xlsb) with no modification, it includes:

Import VBA code in Modules, Dialog and Class Modules.
Attach and Enable to Run all the Workbook and Worksheet events.
Enable the Whole VBA Runtime Environment.
Support to import VBA Userform Controls.
Attach and Enable all the Userform Controls events.

Edit:

It looks like it doesn't have a compatible compiler however:

Unfortunately, OpenOffice can't support export VBA back to MS Excel files, so Export may be the important and interesting area in VBA interoperability.

Still need to download and test myself

decalage2 commented 4 years ago

Well, VBA is stored in several formats in the VBA stream within an Office file: compressed VBA source code, and VBA P-code (which is some kind of pre-parsed bytecode to speed-up loading and execution in MS Office). Most tools, including LibreOffice/OpenOffice (from what I see in the code on github), only process the VBA source code part, not the P-code. The specificity of pcodedmp, is that it parses the P-code, not the VBA source code.

sancarn commented 4 years ago

Oh? So you can export the VBA source from the VBAProject.bin without parsing the P-Code?

decalage2 commented 4 years ago

Even more, normally you cannot get the VBA source code from the P-code, unless you "decompile" it with a tool such as pcode2code (https://github.com/Big5-sec/pcode2code). The normal way to get the VBA source code is simply to decompress it from the VBA module stream, for example using olevba (https://github.com/decalage2/oletools/wiki/olevba) or oledump. More info: http://www.decalage.info/en/vba_tools

sancarn commented 4 years ago

I actually thought the VBA Editor decompiled the P-Code in order to display it to the user. And compiled new code in order to run the P-Code again. At least that is how VB6 worked according to some old posts on VB Forums. So I thought the bin file was merely the p-code. I didn't at all know you can decompress the code from the VBA module stream... If you write code to the VBA module stream, do you know if it will execute in VBA? (I've for a long while wanted the ability to e.g. write TypeScript and transpile to VBA, but never had a way to "compile" Excel VBA projects...)

Thanks for all the information @decalage2 I didn't at all know that olevba existed.


Edit: From reading VBA stomping I now understand that I wasn't totally wrong. VBA does use the P-Code if and only if it's P-Code compatible with that version of VBA, otherwise it uses the compressed stream and recompiles. Very interesting! So if I ever did want to build a VBA compiler/decompiler I'd need to mainly worry about compiling to P-Code, but ideally do both... Very cool that these projects are around and more importantly also open source!


Edit 2: So ultimately:

Thanks for all the detailed information and helping my understanding!!

decalage2 commented 4 years ago

You're right and those three tools are complementary. olevba even uses pcodedmp for some features, and vice-versa. I'm closing the issue.

bontchev commented 4 years ago

There are several different issues here. Basically, OpenOffice is crap that is broken and incompatible in various ways. It's a nice toy but cannot be used as a professional replacement of Microsoft Office.

OpenOffice supports several document formats and several macro languages. Let's concentrate on VBA only.

I was unable to create a VBA macro from OpenOffice. The macro that I created is saved as some form of script in an XML file in the compound document. There is no p-code and I couldn't make it auto run (although I could run it manually).

OpenOffice can read Microsoft Office documents with VBA macros. However, it doesn't seem to auto run them; they have to be executed manually. Since pcodedmp already handles Microsoft Office documents with VBA macros, this is mostly irrelevant.

If I open a Microsoft Office document containing a VBA macro with OpenOffice and save it as an OpenOffice document, the macros are not saved.

Basically, p-code and OpenOffice do not know each other. Therefore, OpenOffice is outside the scope of things that pcodedmp handles.

You are correct that the VBA editor decompiles the p-code and displays that. The p-code is also what is executed. If the VBA macro was created with a version of Office different from the one that is opening it, the compressed VBA source at the end of the module will be re-compiled into p-code and that fresh p-code will be what is displayed by the VBA Editor and what is executed.

pcodedmp does not care about the compressed VBA source. It looks only at the p-code. Tools like olevba can extract the compressed source. pcode2code takes the output of pcodedmp and reconstructs the VBA source from the p-code instructions. It doesn't do as good a job as the VBA Editor, but it can give you a rough impression of what the original VBA code looked like.

sancarn commented 4 years ago

@bontchev Thanks. I didn't realise it OpenOffice's implementation was so incomplete.

Unrelated question: if you fill the P-Code part of an Excel file with 0s, and then add fresh VBA source (compressed) to the file, would this code compile and run when opening the Excel file? Just thinking that this'd be a way of "compiling" into xls/xlsm files :)

DidierStevens commented 4 years ago

Yes, Excel (and other Office applications) will create the P-Code if it's missing, but it's not done by overwriting it with NULL bytes.

Each Module Stream is composed of a PerformanceCache array of bytes (e.g. P-Code) followed by a CompressedSourceCode array of bytes (VBA compressed code). The boundary between the 2 is defined by the MODULEOFFSET. This offset is stored in the Dir stream for each module (PROJECTMODULES record). https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-ovba/c66b58a6-f8ba-4141-9382-0612abce9926

You can create a Module Stream without PerformanceCache byte array but with a CompressedSourceCode byte array: you achieve this by setting MODULEOFFSET to 0. If you open that file with the Office application, open the VBA editor, and then save the file, you will notice that the PerformanceCache byte array (e.g. P-Code) has been added.

bontchev commented 4 years ago

It depends. If you wipe the p-code area with zeroes but leave the compressed source intact and open the document with the same version of VBA as the one that has been used to create it (this usually but not always means the same version of Office), VBA will try to execute p-code consisting of zeroes and will crash.

If, however, you open it with a different version of VBA, the p-code will be re-created from the source code and will be executed.

There was a macro virus which used this as an attack against our scanner (F-PROT - which looks mostly at the p-code). It had the p-code area zapped and the version of VBA that has created it set to some insane value (like 0xFFFF). This meant that a scanner looking at the p-code only wouldn't find anything but the macro would still work, because it will never be opened with "the same version of VBA" on a real machine and the p-code will always be re-created from the source code.