Memory Leak on extracting text from files

We noticed that every time we extract the text from TikaOnDotNet there is memory leak after the text has been extracted:

The code is simple as given in your samples:

new TextExtractor().Extract(filePhysicalPath);

Already using the latest Dlls:

TikaOnDotNet.dll (version 1.17.1.0)

TikaOnDotNet.TextExtraction.dll (version 1.17)

IIS version on we are testing: 10

Target Framework: 4.7.2

Memory leak detection by ANTS Profiler:

The first is the base when we didn't start any extraction, second is the one which we took after the extraction has been completed.

The second one is confirming that memory increased and stayed there even after the extraction has been completed.

You can see from the above screenshots that "LinkedHashMap + Entry" live objects from "java.util" are still there in the memory even after the extraction has already been completed.

I am attaching the PDF with which you can try the above test: PDF: 200 MB size https://drive.google.com/file/d/1DWdWfkHebS9aLpqiLAbaRwwiSamGw8Ym/view?usp=sharing

EDIT:

If I use the following code before and after Tika extraction, the memory comes back to normal levels:

               // Force GC to handle memory leak via Tika
                GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
                GC.WaitForPendingFinalizers();

KevM / tikaondotnet

Memory Leak on extracting text from files #138