Closed ghost closed 6 years ago
I think that post is not correct now that Tika has a lot of new code in it that needs more dependencies. Not sure what exactly is required. Maybe you could watch the assembly loader or profile the app and see what dependencies actually get invoked for your workloads.
Ok, thanks for quick response. I'll see what I can find out with a profiler.
Hey Kevin, I'm using TikaOnDotNet.TextExtractor (v.1.16.0) to extract from a PDF file.
I ran TextExtractor.Extract(string path), and saw that 17 assemblies loaded (see below) compared to the 28 in the package. I then removed all but these 17, and the extraction results were the same.
I did the same with an Excel (.xlsx) file and a Word (.docx) file, and it worked as usual with these 17.
IKVM.OpenJDK.Beans.dll IKVM.OpenJDK.Charsets.dll IKVM.OpenJDK.Core.dll IKVM.OpenJDK.Localedata.dll IKVM.OpenJDK.Media.dll IKVM.OpenJDK.Security.dll IKVM.OpenJDK.SwingAWT.dll IKVM.OpenJDK.Text.dll IKVM.OpenJDK.Util.dll IKVM.OpenJDK.XML.API.dll IKVM.OpenJDK.XML.Bind.dll IKVM.OpenJDK.XML.Parse.dll IKVM.OpenJDK.XML.Transform.dll IKVM.OpenJDK.XML.XPath.dll IKVM.Runtime.dll TikaOnDotNet.dll TikaOnDotNet.TextExtraction.dll
Thanks for letting us know. This may help out other users deploying TikaOnDotnet.
I think there is nothing we can do in this project to curtail the IKVM assemblies propagated upstream as that is managed by Nuget.
Keep it up, Kevin. This is s a very cool project.
Newb here, using TikaDotNet to parse PDF files.
I took the blog post to imply that I could delete all the IKVM assemblies except the 5 listed.
I did so, but now TextExtractor.Extract() fails. I also see that the blog post had 16 assemblies originally, but I started with 28.
Any suggestions to reduce deployment size?