KevM / tikaondotnet

Use the Java Tika text extraction library on the .NET platform
http://kevm.github.io/tikaondotnet/
Apache License 2.0
200 stars 74 forks source link

Errors trying to extract text #124

Closed andystroz closed 5 years ago

andystroz commented 6 years ago

Using this code

const string url = "http://download.microsoft.com/download/E/7/B/E7B25440-1569-40B5-989E-3951FC178214/Microsoft_Press_eBook_Introducing_HDInsight_PDF.pdf";
var textExtractionResult = new TextExtractor().Extract(new Uri(url));

I am getting the following errors:

TypeInitializationException: The type initializer for 'java.lang.Props' threw an exception.

and

FileNotFoundException: Could not load file or assembly 'System.Configuration.ConfigurationManager, Version=0.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'. The system cannot find the file specified.

I am using .NET Core 2.1 in C#.

KevM commented 6 years ago

Hmm. This is likely due to that needing to be an explicit reference in .Net Core. Try adding that nuget reference to your project.

https://www.nuget.org/packages/System.Configuration.ConfigurationManager/

KevM commented 6 years ago

Do let me know if you get this working on .Net core. I'd love to see if IKVM works for your application.

waytoobusy commented 4 years ago

Hello Kevin, I really wish this worked on dotnet core!

I have azure function project on core 3.1 and hit the same issue, basically this inner exception when trying to call any of the Extract() methods. And yes I tried them all :)

The type initializer for 'java.lang.Props' threw an exception.

" at java.lang.System.get_props()\r\n at java.lang.System.getProperty(String key)\r\n at org.apache.tika.config.TikaConfig..ctor()\r\n at org.apache.tika.config.TikaConfig.getDefaultConfig()\r\n at org.apache.tika.parser.AutoDetectParser..ctor()\r\n at TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.Extract(Func`2 streamFactory, Stream outputStream)"

Tried adding System.Configuration.ConfigurationManager as per your advice but no change.

waytoobusy commented 4 years ago

it's a great project, and as a 20 years java developer with 5 recent years in c# i can see the benefit of this project. tika is great and comprehensive.

waytoobusy commented 4 years ago

Anything I can do to help?

Happy to contribute to this wonderful project and keep it alive.

Andy