fxstein / GoProX

The missing GoPro data and workflow manager for macOS
MIT License
25 stars 3 forks source link

Feature: Delta patch compress optional files #26

Open fxstein opened 2 years ago

fxstein commented 2 years ago

The storage requirements for goprox are very significant when all layers of the library are being kept for future re-prosessing.

There are basically 3 copies of data: archive, imported and processed

Both archive and imported are optional lineage copies that allow a user to go back to the originals coming off a camera and re-process the entire workflow. With early versions of goprox this is desirable as logic changes and even bugfixes can easily be applied to the original media.

Since the delta between imported and processed media is metadata only - we are not resampling or recompressing the media files - delta patches could be leveraged to replace media file with significantly smaller delta files, while allowing to restore the original file later on.

Initial testing has resulted in a 99% reduction in storage required to hold imported delta patch files, compared to the original media files.

Example:

xdelta3 -S djw -s /Users/oratzes/goprox-test-moved/imported/2022/20221011/20221011151401_GoPro_Hero10_2442_G1541642.JPG /Users/oratzes/goprox-test-moved/processed/JPEG/2022/20221011/P_20221011151401_GoPro_Hero10_2442_G1541642.jpg P_20221011151401_GoPro_Hero10_2442_G1541642.xdelta

produces a 1.4kB delta file compared to the original 4.7MB image. This would allow users to keep the entire imported media path while consuming 99% less storage than a full copy of the data.

goprox needs to be able to generate the delta patch files, replace the original files in imported and restore them on demand as needed for future processing.

fxstein commented 2 years ago

This concept could even be applied to the archive path of the library, by uncompressing archives, delta patching the individual media files and writing new archives that contain only the delta to the finally processed media P_* files.