decalage2 / olefile

olefile is a Python package to parse, read and write Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
http://www.decalage.info/olefile
Other
230 stars 76 forks source link

read/write custom property #22

Open decalage2 opened 10 years ago

decalage2 commented 10 years ago

Originally reported by: Anonymous


I would like to know how to read and write custom property that we do it in word 2007 above using docprops/custom.xml


decalage2 commented 10 years ago

Original comment by GomathiNayagam Subramanian (Bitbucket: gomesnayagam, GitHub: gomesnayagam):


Thanks for your research Philippe, it is non trivial implementation, i too have less time to play around with this feature.

decalage2 commented 10 years ago

Original comment by Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2):


I just made some quick tests with MS Word 2007 and 2010: when custom properties are added to a document saved in Word 97-2003 format (.doc), their names and values are stored in the standard property stream "\x05DocumentSummaryInformation". You can see them easily with the tool SSView.

However, OleFileIO_PL 0.31 does not seem to parse them when calling getproperties(), so for now I think it is not possible to read custom properties.

Implementing it would require quite a bit of work to extend the getproperties code to support the missing property types according to the Microsoft specifications. I will not have time to work on this now, but contributions are welcome if anybody volunteers to do it. :-)

decalage2 commented 10 years ago

Original comment by Philippe Lagadec (Bitbucket: decalage, GitHub: decalage2):


If you want to read a custom property from any property stream in an OLE file, use the OleFileIO.getproperties() method, as described in the section "Parse a property stream" of the documentation (for now in the overview page).

For example, the code below gets property number 0x12 from the stream "\x05SummaryInformation", which is usually the name of the application that created the file:

#!python
props = ole.getproperties("\x05SummaryInformation")
# get application name, or 'unknown' if not present:
appname = props.get(0x12, 'unknown')

For now (v0.32), OleFileIO_PL cannot write properties to property streams.

However, I am not sure if Word 2007+ stores custom properties in an OLE property stream, when saving a document to the Word 97-2003 format (.doc). Do you have a sample file I could look at?

h4knet commented 4 years ago

Hello, I've noticed this issue is from 2014. Is there any news about the evolution of this feature ?

decalage2 commented 4 years ago

Actually the reading part has been implemented in PR #114: see get_userdefined_properties For now this is only available in the development version if you get it from github, I haven't published a new release yet.

jhhcs commented 2 years ago

Hey @decalage2, would you consider making a release that includes this feature? I depend on olefile in my malware triage toolkit and it would be a huge help if I was able to rely on this to extract document properties.