Open lvca opened 11 months ago
This branch is already in good shape. The records are fully recycled, and I optimized the record management on the page. We have some statements in the documentation that we never recycle RIDs, so we should prepare the docs for this change in 24.1.1.
Still missing one point to make the recycle fully working and the last as optional in this phase to speed up things, but not necessary for the 1st release.
@lvca, as I see you're started considering space economy, a very "most-wanted" missing feature of orientdb was to be able to compact files.. Can this be a case in ArcadeDb anytime soon?
@tolgaulas Correct, ArcadeDB will be able to reuse 100% of the space over time. This first issue will be able to reclaim all the space used and deleted. Indexes already supported compaction since the beginning. What is still missing (after this issue) is:
(Updated the main issue with these new sub-tasks)
@tolgaulas You're right about OrientDB, some structures didn't allow to reuse the whole allocated space, especially with indexes and ridbags.
This PR is in good shape. The missing issue to start thinking about a possible merge (still to test how well works with old legacy databases) is how to manage deleted RIDs in index.
Our LSMTree is highly optimized for ArcadeDB. Any LSMTree has 2 parts (1) the last page which is mutable, and (2) all the other pages are immutable. In ArcadeDB the records that have been deleted have an entry on a more recent page with a negative RID to mark the record as deleted.
Since now record ids are reused, this causes a conflict if a record has been inserted, removed and inserted back.
After further testing the index is fine, it already supports multiple entries with the same RID. The problem is with the test case ConsoleAsyncInsertTest.testOrderByAfterDeleteInsert
which uses the async API. For some reason, the new records inserted after a delete are not updating one index.
Some news. It's nothing related to Asynch, but seems more about how the index manages recycled RIDs when they end up on multiple pages. Still digging into the specific issue. This could be the last issue before doing some porting tests with old databases.
Discussed in https://github.com/ArcadeData/arcadedb/discussions/1203
Phase 1 (scheduled for 24.1.1):
Phase 2 (it will be separated into a new issue):