Closed andreas-eriksson closed 7 years ago
I saw that you fixed it yourself 👍 ... can you tell me what the problem was?
Greetings, Kees van Spelde
I published a new version (1.5.3) to nuget.
I'm not sure why the error occurs. Some suggestions seem to indicate that installed filters could be corrupt but it happens on my test machine as well.
I am hoping that the fix will make the code work for a few more legacy formats.
Thanks :)
Is it possible to send me the old xls file so that I can investigate it some more? If so then send it to sicos2002@hotmail.com
Also if you want to do really advanced things with extracting data from files then have a look at Tika (https://tika.apache.org/). There is also a .NET port that is generated with IKVM (https://github.com/KevM/tikaondotnet).... it's not that iFilters aren't any good but there is a wider support for files in Tika. I have to do everything myself for the iFilters and there is an Apache team behind Tika with more developers. It's just a time management problem :-)
Mail sent.
Thanks for the info, I will definitely investigate Tika.
Also just to to satisfy my own curiosity... for what are you using my library?
It's used to extract text from documents and then making them searchable with Lucene.
Also another thing, you also can use the Java Tika version. It has a web interface that can be called from .NET. It's just what you prefer. I myself prefer .NET above Java.
Me too.
Tika sure looks interesting, especially since it doesn't seem to have any other dependencies. Would be nice if users didn't have to install Office.
You also don't have to install office for my iFilter library. There is a iFilter package for it. You can find it overhere --> https://www.microsoft.com/en-us/download/details.aspx?id=17062
I also made an MSGReader library to extract information from MSG files. It has no Ifilter support since that is kind of difficult to make in .NET. But with some coding you probably can make it work. You can find it overhere --> https://github.com/Sicos1977/MSGReader. Other "extracting" libraries can be found overhere --> https://github.com/Sicos1977/OfficeExtractor and https://github.com/Sicos1977/VCardReader.
Office extractor extract embedded OLE objects from office files... like an Excel attachment inside a Word document.
Hi, I get the following error when I try to read text from an old excel file (.xls).
at IFilterTextReader.NativeMethods.IPersistStream.Load(IStream pStm) at IFilterTextReader.FilterLoader.LoadAndInitIFilter(Stream stream, String extension, Boolean disableEmbeddedContent, String fileName, Boolean readIntoMemory) in C:\Git\IFilterTextReader\IFilterTextReader\FilterLoader.cs:line 160 at IFilterTextReader.FilterReader..ctor(String fileName, String extension, Boolean disableEmbeddedContent, Boolean includeProperties, Boolean readIntoMemory, FilterReaderTimeout filterReaderTimeout, Int32 timeout) in C:\Git\IFilterTextReader\IFilterTextReader\FilterReader.cs:line 201 at IFilterTextViewer.MainForm.SelectButton_Click(Object sender, EventArgs e) in C:\Git\IFilterTextReader\IFilterTextViewer\MainForm.cs:line 139 Exception from HRESULT: 0x8004170C
Is there anything I can do to make it work?