Informatievlaanderen / VSDS-Linked-Data-Interactions

https://informatievlaanderen.github.io/VSDS-Linked-Data-Interactions/
European Union Public License 1.2
4 stars 6 forks source link

LDESClient has database persistence strategies but they are not very performant #302

Closed Tomvbe closed 5 months ago

Tomvbe commented 12 months ago

This issue is the result from the following performance testing: Performance_test.xlsx

When using a database persistence strategy in the LDESClient such as SQLITE or POSTGRES, the performance becomes very slow very quickly.

A quick win here could be adding indices.

Tomvbe commented 5 months ago

The main issue was actually the persistence context slowing down because the client is an infinite loop on a single thread. In normal situations, you use your persistence context with different threads, different calls and it gets cleaned up but in our case it wasn't. To fix this, we now do manual clears on strategic moments. This is implemented in https://github.com/Informatievlaanderen/VSDS-Linked-Data-Interactions/pull/548

When processing 100k members, using a postgres the impact is shown below. This graph shows the number of seconds to process 10k members. Red is before, yellow is after.

seconds_per_10k

We also extracted the logic that prevents duplicates to a separate filter in https://github.com/Informatievlaanderen/VSDS-Linked-Data-Interactions/pull/532.

Now only fragments are persisted and members of mutable fragments that we are following.