Sicos1977 / IFilterTextReader

A reader that gets text from different file formats through the IFilter interface
Other
55 stars 38 forks source link

Problem with PDF class loading #18

Closed Terrious closed 7 years ago

Terrious commented 7 years ago

I am using your great IFilterTextReader in one small project running on Server 2012 as a console app. It was working great for a period of about 3 month and now I needed to add some changes to my code. After uploading update it show error trying to read PDF file: Unhandled Exception: System.Exception: DLL name: 'C:\Program Files\Adobe\Adobe P DF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll' Class: {E8978DA6-047F-4E3D-9C78-CDBE46041603}' ---> System.Runtime.InteropServic es.COMException: Error HRESULT E_FAIL has been returned from a call to a COM com ponent. at IFilterTextReader.NativeMethods.IClassFactory.CreateInstance(Object pUnkOu ter, Guid& refiid, Object& ppunk) at IFilterTextReader.FilterLoader.LoadFilterFromDll(String dllName, String fi lterPersistClass) in D:\Projects\IFilterTextReader-master\IFilterTextReader\Filt erLoader.cs:line 207 --- End of inner exception stack trace --- at IFilterTextReader.FilterLoader.LoadFilterFromDll(String dllName, String fi lterPersistClass) in D:\Projects\IFilterTextReader-master\IFilterTextReader\Filt erLoader.cs:line 214 at IFilterTextReader.FilterLoader.LoadAndInitIFilter(Stream stream, String ex tension, Boolean disableEmbeddedContent, String fileName, Boolean readIntoMemory ) in D:\Projects\IFilterTextReader-master\IFilterTextReader\FilterLoader.cs:line 121 at IFilterTextReader.FilterReader..ctor(String fileName, String extension, Bo olean disableEmbeddedContent, Boolean includeProperties, Boolean readIntoMemory, FilterReaderTimeout filterReaderTimeout, Int32 timeout) in D:\Projects\IFilterT extReader-master\IFilterTextReader\FilterReader.cs:line 201

The most strange issue is following: for test purposes I used your IFilterTextViewer to test if IFilter of PDF is working and your app works well !!! Where my app causes this exception to be thrown.

My code calling your DLL is following: var filterReader = new FilterReader(documentFile, documentExtension, false, false, false, FilterReaderTimeout.NoTimeout, -1); string textContent = filterReader.ReadToEnd();

Can you advice me how to resolve it ? Thank you in prior

Sicos1977 commented 7 years ago

Did you use the Job class, you need it to make the Adobe IFilter work. See the demo app that is in the project about how to use it.

Sicos1977 commented 7 years ago

See the comment that I added to the Job class :-)

/// <summary>
/// Use this class to sandbox Adobe IFilter 11 or higher when you want to use this code on Windows 2012 or higher
/// </summary>

/// <summary>
/// Make a job object to sandbox the IFilter code
/// </summary>
private readonly Job _job = new Job();

// Add the current process to the sandbox
_job.AddProcess(Process.GetCurrentProcess().Handle);
Terrious commented 7 years ago

You are right. I forget to use it. I try it right now, just need to reconstitute my code.

Terrious commented 7 years ago

Kees, it works just great ! Thank you. Please, take my donation as compliment.

Sicos1977 commented 7 years ago

You are welcome. Just for my own curiosity ... for what are you using the iFilterTextReader?

Terrious commented 7 years ago

I am extracting text from documents (they are stored in encrypted form and I decrypt them each time I need to index) and using Lucene.Net for search with referencing criterias of various types (mostly IDs of numerous parameters.

Sicos1977 commented 7 years ago

If my project has some shortcomings then take a look at Tika (https://tika.apache.org/) . It's a java library but there is also a .NET port that is generated with IKVM. You can find it overhere --> https://github.com/KevM/tikaondotnet

Terrious commented 7 years ago

I will take a look at that for sure. The way my app works is ok for the moment, but who know, may be I will need more productive tools. Thank you for you work, you made a very useful library.