aiidateam / disk-objectstore

An implementation of an efficient "object store" (actually, a key-value store) writing files on disk and not requiring a running server
https://disk-objectstore.readthedocs.io
MIT License
15 stars 8 forks source link

Refactoring the code to enable efficient access to packed compressed objects #142

Closed giovannipizzi closed 1 year ago

giovannipizzi commented 1 year ago

For reading packed file as it is, there is no need to restrict the whence parameter of the seek method to be only 0 or 1. In this PR, the main goal is to enable whence=2, i.e. search from the bottom of a file, needed by some formats/libraries.

Compressed files are more tricky, as it is not possible to freely seek to the end (at least not in a cheap way). Instead, the entire files will be decompressed back into a loose file, which will then be opened for reading.

If such file exists already it will be used, so we don't decompress twice. Such "cache" files are deleted during the routine maintainance operations (e.g. clean_storage).

In the current PR, upon certain conditions (now well defined, i.e. when seeking with the following conditions):

To achieve this goal in a robust way, we define a LazyLooseStream class that allows to define which loose file to open, delaying the opening to a later point, and in this way enabling code that ensures that always closes any open file.

I also added code to ensure that there should not be race conditions if a clean_storage is running at the same time.

Furthermore, I cleaned up a bit the code and added various tests to increased coverage, since it had dropped over time. It didn't go back to 100% but we are close (for the core library files).

Furthermore, I used the occasion to a new validate CLI command that also uses tqdm (if installed) to show progress.

This PR fixes #136. This also replaces and thus closes #140 and closes #141

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 98.86% and project coverage change: +0.03 :tada:

Comparison is base (f1809d4) 99.52% compared to head (ca1c1cb) 99.55%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #142 +/- ## =========================================== + Coverage 99.52% 99.55% +0.03% =========================================== Files 8 8 Lines 1676 1795 +119 =========================================== + Hits 1668 1787 +119 Misses 8 8 ``` | [Impacted Files](https://app.codecov.io/gh/aiidateam/disk-objectstore/pull/142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidateam) | Coverage Δ | | |---|---|---| | [disk\_objectstore/container.py](https://app.codecov.io/gh/aiidateam/disk-objectstore/pull/142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidateam#diff-ZGlza19vYmplY3RzdG9yZS9jb250YWluZXIucHk=) | `99.88% <96.00%> (+0.48%)` | :arrow_up: | | [disk\_objectstore/utils.py](https://app.codecov.io/gh/aiidateam/disk-objectstore/pull/142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidateam#diff-ZGlza19vYmplY3RzdG9yZS91dGlscy5weQ==) | `98.79% <99.08%> (-0.81%)` | :arrow_down: | | [disk\_objectstore/cli.py](https://app.codecov.io/gh/aiidateam/disk-objectstore/pull/142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidateam#diff-ZGlza19vYmplY3RzdG9yZS9jbGkucHk=) | `100.00% <100.00%> (+1.44%)` | :arrow_up: | | [disk\_objectstore/examples/example\_objectstore.py](https://app.codecov.io/gh/aiidateam/disk-objectstore/pull/142?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=aiidateam#diff-ZGlza19vYmplY3RzdG9yZS9leGFtcGxlcy9leGFtcGxlX29iamVjdHN0b3JlLnB5) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.