Sicos1977 / IFilterTextReader

A reader that gets text from different file formats through the IFilter interface
Other
55 stars 38 forks source link

Demo app fails on docx, xlsx, pptx #2

Closed SimonKravis closed 10 years ago

SimonKravis commented 10 years ago

I'm running the IFilterTextView demo app from VS2013 on Win 8.1, building for any CPU. It fails to extract text when I use it to open Office OpenXML and msg files with the following messages:

docx - There is no IFilter installed for the file 'Audit proposal.docx' xlsx - Exception from HRESULT: 0x8004170C pptx - Exception from HRESULT: 0x8004170C msg - There is no IFilter installed for the file 'FW Emailing The Autism of Knowledge Management - Copy.msg'

If tried building for x64 and x86 with the same results

It works OK with .doc, .pdf and .xls files.

SearchFilterView shows there are installed IFilters as shown below, and Windows Search (which uses Ifilters) finds content in the Office OpenXML format files and in msg files, so I think the Ifilters are there. Any ideas?

msgfilt.dll Office Outlook MSG IFilter Microsoft Message IFilter nlhtml.dll HTML filter HTML filter nlhtml.dll HTML filter HTML filter odffilt.dll Open Document Format ODT Filter Microsoft Filter for Open Document Format odffilt.dll Open Document Format ODS Filter Microsoft Filter for Open Document Format odffilt.dll Open Document Format ODP Filter Microsoft Filter for Open Document Format OffFilt.dll Microsoft Office Filter OFFICE Filter offfiltx.dll Zip Filter Microsoft Office Open XML Format Filter offfiltx.dll Office Open XML Format Excel Filter Microsoft Office Open XML Format Filter offfiltx.dll Office Open XML Format PowerPoint Filter Microsoft Office Open XML Format Filter offfiltx.dll Office Open XML Format Excel Filter Microsoft Office Open XML Format Filter offfiltx.dll Office Open XML Format Word Filter Microsoft Office Open XML Format Filter

SimonKravis commented 10 years ago

The problem for docx files is that a Persistent Handler is not defined for .docx in HKLM\Software\Classes.docx. Running the demo on a different machine (Windows 7 Office 2010) text is extracted from all the problem extensions. Looks like my registry has been corrupted somehow - may explain some other odd behaviour. I have Office 2007 on the Win 8.1 machine and have had problems with it expecting that Office 2010 was installed.