Informatievlaanderen / VSDS-Linked-Data-Interactions

https://informatievlaanderen.github.io/VSDS-Linked-Data-Interactions/
European Union Public License 1.2
4 stars 5 forks source link

performance degradation LDES Client + Repository materialiser #660

Open KVerduyn opened 6 days ago

KVerduyn commented 6 days ago

Performance of the LDIO docker container is slowing down, with excessive memory consumption (10GB+), resulting in docker container dropping out due to heap space isues. Failure is after +/- 5 hours Two pipelines are set up, one from geomobility, one from telraam. Last Logfile is attached.

System is running on a hetzner server with 16GB memory on ubuntu linux, shared cpu.

LDES Client.zip

_LDES_Client_Telraam_logs-2.txt

rorlic commented 5 days ago

The issue can be reproduced with the attached configuration (gh-issue-660.zip). The problem lies in the repository materializer (Ldio:RepositoryMaterialiser) because the issue is none existing when using a no-op output (Ldio:NoopOut).

The left part of the graph shows the memory usage with the no-op output. On the right is the heap usage when having both pipelines output to a graph DB using the repository materializer:

image

The beahviour is most-likely due to the repository materializer components using the same graph DB connection. This results is one of the pipelines not being able to send the output to the graph DB. The following queries can be used to check the number of received version objects:

select (Count(?S) as ?telling) FROM <http://geomobility.eu/> where { ?S a <https://implementatie.data.vlaanderen.be/ns/vsds-verkeersmetingen#Verkeerstelling> . }
select (Count(?S) as ?telling) FROM <http://telraam.net/> where { ?S a <https://implementatie.data.vlaanderen.be/ns/vsds-verkeersmetingen#Verkeerstelling> . }

We noticed another (smaller) issue while investigating the above: the output counters are incremented before the members are actually received by the graph DB. This results in a mismatch of the above counts and the prometheus ldio_data_out_total counters. See github issue #661.