I started playing with TikaOnDotnet today and created a simple case with pdf file extraction.
Unfortunately I have an issue when calling TextExtractor.Extract() method (both overloads - with byte[] and string path as arguments)
The exception is:
TextExtractionException: Extraction failed.
TypeInitializationException: The type initializer for 'java.nio.charset.StandardCharsets' threw an exception.
TypeLoadException: Could not load type 'System.Reflection.Emit.MethodToken' from assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'.
The code to reproduce is very simple - I only do:
var tikaResult_path = new TextExtractor().Extract(pathToPdf);
//(..)
// .. get file stream and initialize StreamReader instance
var bytes = await streamReader.ReadToEndAsync();
var tikaResult_bytes = new TextExtractor().Extract(bytes);
They both fail with the same exceptions.
The version of TikaOnDotNet.TextExtraction installed: 1.17.1 (date published: Tuesday, April 3, 2018 (4/3/2018))
I started playing with TikaOnDotnet today and created a simple case with pdf file extraction. Unfortunately I have an issue when calling TextExtractor.Extract() method (both overloads - with
byte[]
andstring path
as arguments) The exception is:The code to reproduce is very simple - I only do:
They both fail with the same exceptions.
The version of TikaOnDotNet.TextExtraction installed:
1.17.1
(date published:Tuesday, April 3, 2018 (4/3/2018)
)I saw this comment in another issue: https://github.com/KevM/tikaondotnet/issues/118#issuecomment-551432052 And verified whether these dlls mentioned there get copied to the output folder - and yes, they do get copied (i.e.
IKVM.OpenJDK.Cldrdata.dll
).