ARPA-SIMC / arkimet

A set of tools to organize, archive and distribute data files.
Other
14 stars 5 forks source link

strange warning with arki-check #305

Closed lidiabressan closed 1 year ago

lidiabressan commented 1 year ago

ciao,

I got this message with arki-check, but I'm sure that I didn't delete any data:

cosmo_5M_itr:2023/05-22.grib: possibly deleted data found not tracked by index: 423857604b would be freed by a repack

is it possibly because of duplicated data?

brancomat commented 1 year ago

They might be duplicated data. Duplicated data are not freed until a repack occurs (@spanezz correct me if I'm wrong).

The documentation is ambiguous since it states "replace: when yes, importing duplicate data will replace the existing version." (https://arpa-simc.github.io/arkimet/datasets.html) What actually happens is that the most recent duplicate import replaces the existing version in the index while the actual data (in this case, grib messages) are kept until a repack is launched.

If that's the case I see two possible improvements:

spanezz commented 1 year ago

@brancomat is correct: that mesage means that there is data in the segment that is not tracked by the index, which normally happens when data is deleted or replaced (a replace is a delete of the old one and an append of the new one)

I'll now update message and documentation as you said