Informatievlaanderen / VSDS-Linked-Data-Interactions

https://informatievlaanderen.github.io/VSDS-Linked-Data-Interactions/
European Union Public License 1.2
4 stars 6 forks source link

LDESClient performance slows down when a lot of fragments have been processed #301

Closed Tomvbe closed 5 months ago

Tomvbe commented 12 months ago

This issue is the result from the following performance testing: Performance_test.xlsx

We would expect the performance of the client to be constant, however there is a positive correlation between how many fragments have been processed and the time it takes to process a fragment. This makes it "impossible" to follow a server over a long period of time (assuming larger datasets).

sandervd commented 8 months ago

The state of which fragments are visited should be 'outsourced' to the database, an not kept in memory?

Tomvbe commented 7 months ago

We currently keep track of:

Perhaps we should look into a solution where we only care about the processed members of the open fragments and the open fragments?

In this setup we would keep track of the open fragments and 'forget' the fragments that we processed earlier. This will only work if we guarantee every stream to be one directional.

This would ensure that the storage use does not keep increasing with memberCount. There would still be a smaller increase as the structural tree grows. For example if you have timebased fragmentation by day you would have one extra fragment to keep track of per day.

Types of open fragment:

Tomvbe commented 5 months ago

The main issue was actually the persistence context slowing down because the client is an infinite loop on a single thread. In normal situations, you use your persistence context with different threads, different calls and it gets cleaned up but in our case it wasn't. To fix this, we now do manual clears on strategic moments. This is implemented in https://github.com/Informatievlaanderen/VSDS-Linked-Data-Interactions/pull/548

When processing 100k members, using a postgres the impact is shown below. This graph shows the number of seconds to process 10k members. Red is before, yellow is after.

seconds_per_10k

We also extracted the logic that prevents duplicates to a separate filter in https://github.com/Informatievlaanderen/VSDS-Linked-Data-Interactions/pull/532.

Now only fragments are persisted and members of mutable fragments that we are following.