Surprisingly, the tarfile standard library module already does some deduplication (ref. #28): if more then one path are added to the archive that are hard links to the same file in the file system, the standard lib creates hard links in the archive:
$ ls -la base
total 8
drwxr-xr-x 1 rolf users 32 May 25 15:33 .
drwxr-xr-x 1 rolf users 30 May 25 15:34 ..
-rw------- 2 rolf users 385 May 25 13:16 rnd1.dat
-rw------- 2 rolf users 385 May 25 13:16 rnd2.dat
$ archive-tool create archive.tar base
$ tar tvf archive.tar
-r--r--r-- rolf/users 683 2019-05-25 15:34 base/.manifest.yaml
drwxr-xr-x rolf/users 0 2019-05-25 15:33 base/
-rw------- rolf/users 385 2019-05-25 13:16 base/rnd1.dat
hrw------- rolf/users 0 2019-05-25 13:16 base/rnd2.dat link to base/rnd1.dat
Unfortunately, archive-tool verify chokes on such an archive:
Surprisingly, the
tarfile
standard library module already does some deduplication (ref. #28): if more then one path are added to the archive that are hard links to the same file in the file system, the standard lib creates hard links in the archive:Unfortunately,
archive-tool verify
chokes on such an archive: