KevM / tikaondotnet

Use the Java Tika text extraction library on the .NET platform
http://kevm.github.io/tikaondotnet/
Apache License 2.0
195 stars 73 forks source link

exception when extracting text from pdf-file #123

Open FroggieFrog opened 6 years ago

FroggieFrog commented 6 years ago

I tried to use TikaOnDotNet, but it already fails in a very simple test-project (see attachment). Is there anything I can do to make it work?

Some infos: Os: Win 10 1803 Locale: de-de

The error message:

TikaOnDotNet.TextExtraction.TextExtractionException: Extraction of text from the file '...tikadotnet.pdf' failed. ---> TikaOnDotNet.TextExtraction.TextExtractionException: Extraction failed. ---> System.TypeInitializationException: Der Typeninitialisierer für "org.apache.tika.metadata.Metadata" hat eine Ausnahme verursacht. ---> System.InvalidCastException: Das Objekt des Typs "java.util.PropertyResourceBundle" kann nicht in Typ "sun.util.resources.OpenListResourceBundle" umgewandelt werden.

bei sun.util.resources.LocaleData.getCurrencyNames(Locale locale)

bei sun.util.locale.provider.LocaleResources.getCurrencyName(String key)

bei sun.util.locale.provider.CurrencyNameProviderImpl.getString(String , Locale )

bei sun.util.locale.provider.CurrencyNameProviderImpl.getSymbol(String currencyCode, Locale locale)

bei java.util.Currency.CurrencyNameGetter.getObject(CurrencyNameProvider , Locale , String , Object[] )

bei java.util.Currency.CurrencyNameGetter.getObject(LocaleServiceProvider , Locale , String , Object[] )

bei sun.util.locale.provider.LocaleServiceProviderPool.getLocalizedObjectImpl(LocalizedObjectGetter , Locale , Boolean , String , Object[] )

bei sun.util.locale.provider.LocaleServiceProviderPool.getLocalizedObject(LocalizedObjectGetter getter, Locale locale, String key, Object[] params)

bei java.util.Currency.getSymbol(Locale locale)

bei java.text.DecimalFormatSymbols.initialize(Locale )

bei java.text.DecimalFormatSymbols..ctor(Locale locale)

bei sun.util.locale.provider.DecimalFormatSymbolsProviderImpl.getInstance(Locale locale)

bei java.text.DecimalFormatSymbols.getInstance(Locale locale)

bei sun.util.locale.provider.NumberFormatProviderImpl.getInstance(Locale , Int32 )

bei sun.util.locale.provider.NumberFormatProviderImpl.getIntegerInstance(Locale locale)

bei java.text.NumberFormat.getInstance(LocaleProviderAdapter , Locale , Int32 )

bei java.text.NumberFormat.getInstance(Locale , Int32 )

bei java.text.NumberFormat.getIntegerInstance(Locale inLocale)

bei java.text.SimpleDateFormat.initialize(Locale )

bei java.text.SimpleDateFormat..ctor(String pattern, DateFormatSymbols formatSymbols)

bei org.apache.tika.utils.DateUtils.createDateFormat(String , TimeZone )

bei org.apache.tika.utils.DateUtils.loadDateFormats()

bei org.apache.tika.utils.DateUtils..ctor()

bei org.apache.tika.metadata.Metadata..cctor()

--- Ende der internen Ausnahmestapelüberwachung ---

bei org.apache.tika.metadata.Metadata..ctor()

bei TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.Extract(Func`2 streamFactory, Stream outputStream) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\Stream\StreamTextExtractor.cs:Zeile 19.

--- Ende der internen Ausnahmestapelüberwachung ---

bei TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.Extract(Func`2 streamFactory, Stream outputStream) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\Stream\StreamTextExtractor.cs:Zeile 42.

bei TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](Func2 streamFactory, Func3 extractionResultAssembler) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\TextExtractor.cs:Zeile 85.

bei TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](String filePath, Func`3 extractionResultAssembler) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\TextExtractor.cs:Zeile 27.

--- Ende der internen Ausnahmestapelüberwachung ---

bei TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](String filePath, Func`3 extractionResultAssembler) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\TextExtractor.cs:Zeile 31.

bei TikaOnDotNet.TextExtraction.TextExtractor.Extract(String filePath) in C:\projects\tikaondotnet\src\TikaOnDotnet.TextExtractor\TextExtractor.cs:Zeile 17.

bei Test.Tika.Class1.Extract(String filePath) in Test.Tika\Test.Tika\Class1.cs:Zeile 16.

bei WindowsFormsApp1.Form1.button1_Click(Object sender, EventArgs e) in Test.Tika\WindowsFormsApp1\Form1.cs:Zeile 32.

Test.Tika.zip

KevM commented 6 years ago

I think this a duplicate of #118. @chrisoverton91 Did you find a fix?