ARPA-SIMC / arkimet

A set of tools to organize, archive and distribute data files.
Other
15 stars 5 forks source link

Check/repack optmimization ideas #251

Closed spanezz closed 1 year ago

spanezz commented 3 years ago

Related to #242/#245, there might be ways to optimize ram usage and speed further.

The general idea is that we don't need to keep the whole metadata in ram for checking/repacking.

For checking, we only need to store the (offset,size) spans of each metadata. We query them out of the database in the order that we want to have, so they're enough to detect gaps and out of order situations.

For repacking, we need the original (offset,size) spans of each metadata and the computed new ones. Then we can write the new segment simply by copying data over from the old one (could this be optimized with splice?). To update the index, it should be enough to tell the database to update existing entries changing their offset.

For reindexing, since we don't do it while reordering, we can do it in a streaming way, without keeping metadata in ram at all.

edigiacomo commented 1 year ago

Questa issue è stata chiusa per inattività. Nel caso in cui sia un argomento ancora rilevante, si prega di riaprirla con una motivazione che tenga conto delle modifiche applicate nel corso degli anni al progetto.