RKrahl / archive-tools

Tools for managing archives
Apache License 2.0
1 stars 2 forks source link

inconsistent result from archive-tool diff with option --skip-dir-content #56

Closed RKrahl closed 3 years ago

RKrahl commented 3 years ago

The command line tool archive-tool diff features the option --skip-dir-content that, according to the documentation is supposed to: "in the case of a subdirectory missing from one archive, only report the directory, but skip its content." The result is inconsistent in some situations. Consider two archives:

$ archive-tool ls archive-a.tar
drwxr-xr-x  rolf/rk    0  2021-05-14 15:47  base
drwxr-x---  rolf/rk    0  2021-05-14 15:47  base/data
drwxr-xr-x  rolf/rk    0  2021-05-14 14:54  base/data/aa
-rw-r--r--  rolf/rk  347  2021-05-14 14:54  base/data/aa/rnda.dat
-rw-------  rolf/rk  385  2021-05-14 14:45  base/data/rnd.dat
-rw-r--r--  rolf/rk  487  2021-04-18 23:11  base/data/rnd2.dat
drwxr-xr-x  rolf/rk    0  2021-05-14 14:45  base/empty
-rw-r--r--  rolf/rk    7  2021-05-14 14:45  base/msg.txt
-rw-------  rolf/rk  385  2021-05-14 14:45  base/rnd.dat
lrwxrwxrwx  rolf/rk    0  2021-05-14 14:45  base/s.dat -> data/rnd.dat

$ archive-tool ls archive-b.tar
drwxr-xr-x  rolf/rk    0  2021-05-14 15:47  base
drwxr-xr-x  rolf/rk    0  2021-05-14 14:54  base/data/aa
-rw-r--r--  rolf/rk  347  2021-05-14 14:54  base/data/aa/rnda.dat
-rw-r--r--  rolf/rk   42  2021-05-14 15:49  base/data/rnd2.dat
drwxr-xr-x  rolf/rk    0  2021-05-14 15:49  base/data/zz
-rw-r--r--  rolf/rk  347  2021-05-14 15:49  base/data/zz/rndz.dat
drwxr-xr-x  rolf/rk    0  2021-05-14 14:45  base/empty
-rw-r--r--  rolf/rk    7  2021-05-14 14:45  base/msg.txt
-rw-------  rolf/rk  385  2021-05-14 14:45  base/rnd.dat
lrwxrwxrwx  rolf/rk    0  2021-05-14 14:45  base/s.dat -> data/rnd.dat

Note that archive-b.tar contains some content below base/data, but not that directory itself.

The output of archive-tool diff without the --skip-dir-content option is correct:

$ archive-tool diff archive-a.tar archive-b.tar
Only in archive-a.tar: base/data
Only in archive-a.tar: base/data/rnd.dat
Files archive-a.tar:base/data/rnd2.dat and archive-b.tar:base/data/rnd2.dat differ
Only in archive-b.tar: base/data/zz
Only in archive-b.tar: base/data/zz/rndz.dat

With that option, the output is just misleading or even plain wrong in this case:

$ archive-tool diff --skip-dir-content archive-a.tar archive-b.tar
Only in archive-a.tar: base/data
Only in archive-b.tar: base/data/aa
Only in archive-b.tar: base/data/rnd2.dat
Only in archive-b.tar: base/data/zz

Note that archive-tool diff reports base/data/aa to be missing in archive-a.tar, which is false.

RKrahl commented 3 years ago

The core of the issue is that this feature has been implemented with the assumption that if an archive does not contain some directory, it will also not contain any content below that directory.

The problem is: it is not even clear what should be the correct output in this case? What should it mean to skip the directory content? Skip reporting any difference below that directory or only skip reporting missing content in archive-b.tar?

I tend to the former: skip any reporting below that level. Because the --skip-dir-content option is only supposed to make any sense if the assumption above is true anyway.