KevM / tikaondotnet

Use the Java Tika text extraction library on the .NET platform
http://kevm.github.io/tikaondotnet/
Apache License 2.0
195 stars 73 forks source link

Proof of concept .NET Core version #151

Closed dylanlangston closed 2 years ago

dylanlangston commented 2 years ago

This is a proof of concept TikaOnDotnet build that multitargets .NET 4.6.2 and .NET Core 3.1. There are some major changes to the build process in this fork.... Using IKVM-Revived version 8.2.1 and its IkvmReference tag instead of the old build script.

This isn't ready to merge yet. Mainly submitting to get the ball rolling on issue #113.

Thank You :)

KevM commented 2 years ago

Thank you so much for doing this. I'll take a look as soon as I can. I do think this is a good opportunity to get tika working on dotnet core and update the deployment to be "modern" on Github actions. If anyone wants to setup up and collaborate on this do chime in. I do not have a lot of bandwidth for this project but do want to keep the lights on if possible now that ikvm revived has been released.

KevM commented 2 years ago

I have it building now but the transitive dependency created by the IKVM compile step is not being added to the .nupkg for the respective targets. Specifically tika.core.dll and tika.framework.dll are not present. In fact these are missing the ikvm created assemblies seem that we likely want to package not the wrapper projects (.Core, and .Framework) which reference them.

It seems we need to do the following for the TikeOnDotnet package (most of which you already have working with your msbuild based solution)

  1. Download the tika .jar file.
  2. Build a Framework and Core version of the tika assembly via IKVMReference as I cannot get ikvmc.exe to work standalone. Assemblies should have the file name TikaOnDotNet.dll
  3. Create a nuget package Tika.OnDotNet which has the ikvm created assemblies in the proper lib folders.

Separately the TikaOnDotNet.TextExtractor package needs to be updated to work with version 2.4.1 of tika. There seem to be breaking API changes.

KevM commented 2 years ago

Closing this to move work to #152 because I wanted to continue this work on its own branch. I did pull in the POC and will be using it as a basis for the work. Thank you!