fluree / core

Fluree releases and public bug reports
0 stars 0 forks source link

Transaction Processing Against DBs with Indexed Data is Significantly Non-Performant #41

Open aaj3f opened 8 months ago

aaj3f commented 8 months ago

Description

When testing database creation / loading w/ larger data (i.e. 10mb+), if the data is sent as one large initial transaction, the transaction processing is fine. If the data begins to be broken up into multiple transactions, such that the first one is committed to disk (and written to index) and then subsequent transactions involve index lookups against existing db state in order to process... in these scenarios, the txn processing takes a significantly long time (15 minutes or more, if the transaction is even accepted at all).

The thought (coming from this thread between myself and @zonotope) is that these subsequent txns need to do an index lookup to evaluate if incoming IRIs describe entities already known to the db. Those index lookups + evals appear to be critically slow.

Some discussion has been had about solving this w/ deterministic IRI -> hash/sid methods, such that the step to actually query the sids for a particular IRI can be removed.

This is a fairly critical ticket as anyone looking to add any reasonably normal size of db data will effectively hit a wall while trying to add that data

aaj3f commented 6 months ago

Can confirm that this is still an issue with server (Laurent & my hackathon projects have demonstrated this). Admittedly, we may need/want to wait for @zonotope's deterministic sid work to complete before testing this again (and/or selecting this item for work), but I'll leave it in Backlog (as opposed to Icebox) to keep track of the priority of this item

bplatz commented 6 months ago

I think the sid work will generally make many things faster (and perhaps a small few things slower) - but I doubt it will address this problem. Something isn't right - and I suspect we have a situation where we are holding large sequences of data in memory unnecessarily, or perhaps even the indexes are not being used.