Large disk space usage when archiving entire profile using disk-objectstore with compression

[x] AiiDA Troubleshooting Documentation
[x] AiiDA Discourse Forum

Describe the bug

I was trying to export my profile to an archive. My profile's disk-objectstore contains compressed packed files (verdi storage maintain --compress). During the process, I found that my disk space quickly shrinks due to the disk-objectstore losen all objects.

I think this should not happen as it would increase disk spacing needed to create the archive by several folds, espeically when the repository contains very compressible files (~ 20% compression ratio). In my case, the repository contains lots of VASP XML files.

Steps to reproduce

Steps to reproduce the behavior:

Run verdi storage maintain --compress with loose files
verdi archive create -a to create an archive
Go to the disk-objectstore contain folder to verify that the loose files reappear.

Expected behavior

No extra storage use as the archive writer should be able to just read the stream sequentially from the storage and write it to the archive.

Your environment

Operating system [e.g. Linux]: Linux
Python version [e.g. 3.7.1]: 3.9.12
aiida-core version [e.g. 1.2.1]: 2.6.2

Other relevant software versions, e.g. Postres & RabbitMQ

Additional context

I have pinned down the cause, this is due to compressed objects in the pack can only be read sequentially as a stream, so to support various seek options it opt to loose the object to disk on-demand (https://github.com/aiidateam/disk-objectstore/pull/142).

When writing the archive, the writer uses seek(0, 2) in order to find the size of the stream, which triggers the loosening of the object to the disk.

https://github.com/aiidateam/aiida-core/blob/9baf3ca96caa5577ec8ed6cef69512f430bb5675/src/aiida/tools/archive/implementations/sqlite_zip/writer.py#L170-L178

I'm wondering if there is any other way to obtain the size of the object? It appears there is no repository API for doing so. Although such information is certainly available in the sqlite database of disk-objectstore / through file system (in case of loose file).

Alternatively, we can always have force_zip64, so there is no need to seek and tell. It also speeds up the archive process but can result in increased archive size due to the extra zip64 header.

aiidateam / aiida-core