KevM / tikaondotnet

Use the Java Tika text extraction library on the .NET platform
http://kevm.github.io/tikaondotnet/
Apache License 2.0
195 stars 73 forks source link

Error extracting text from file #137

Closed Jojov11 closed 4 years ago

Jojov11 commented 4 years ago

Hi,

after install from NuGets TikaOnDotNet and TikaOnDotNet.TextExtractor. I'm tried example code and get this error:

TikaOnDotNet.TextExtraction.TextExtractionException: Extraction of text from the file 'D:**\file.pdf' failed. ---> TikaOnDotNet.TextExtraction.TextExtractionException: Extraction failed. ---> javax.xml.transform.TransformerFactoryConfigurationError: Provider com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl not found at javax.xml.transform.FactoryFinder.newInstance(Class , String , ClassLoader , Boolean , Boolean ) at javax.xml.transform.FactoryFinder.find(Class , String ) at javax.xml.transform.TransformerFactory.newInstance() at TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.GetTransformerHandler(Stream outputStream) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\Stream\StreamTextExtractor.cs:line 48 at TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.Extract(Func2 streamFactory, Stream outputStream) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\Stream\StreamTextExtractor.cs:line 20 --- End of inner exception stack trace --- at TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.Extract(Func2 streamFactory, Stream outputStream) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\Stream\StreamTextExtractor.cs:line 42 at TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](Func2 streamFactory, Func3 extractionResultAssembler) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\TextExtractor.cs:line 85 at TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](String filePath, Func3 extractionResultAssembler) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\TextExtractor.cs:line 0 --- End of inner exception stack trace --- at TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](String filePath, Func3 extractionResultAssembler) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\TextExtractor.cs:line 31

Would you help me fix this problem?

Jojov11 commented 4 years ago

Okay, I found solution for this problem. If in a Solution more then one project than "tikaondotnet" must be installed in all projects.