RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
144 stars 61 forks source link

Memory usage comparison between 6.2.1 and 6.0.0 #217

Closed nicolastoira closed 1 month ago

nicolastoira commented 10 months ago

I'm using the rmlmapper to map JSON data to RDF turtle data. I do not use any specific option except for the mapping file, output file, and serialization format.

I recently moved from version 6.0.0 to version 6.2.1 and I noticed that the latest version as a much larger memory usage with respect to the other version. I tried with a 1.2GB JSON file and for the latest version it fails quite quickly with Killed message due to memory.

As far as you know, is there any new features, code logic that consumes more memory in the latest version with respect to the older one? Do you have any recommendations in terms of maximum input data size? It seems that in the 6.2.1 version the ingested data is loaded multiple times into memory. Anything that I can manually get rid of for my special use case of JSON to RDF conversion?

Let me know if you have any recommendations regarding this issue. Thank you.

DylanVanAssche commented 10 months ago

We upgraded several libraries between these versions, that's the most significant change between these versions.

A path forward would be analyzing the memory usage with a profiler with the data you have.

DylanVanAssche commented 1 month ago

Since there was no response in the last 9 months on this issue, I will close it. Please re-open or comment on this issue if it needs to be re-opened.