AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

tar2sqfs fails "File exists" #120

Closed puigru closed 3 months ago

puigru commented 11 months ago

I was having some issues with my system install, so I decided to take the opportunity to do a fresh install. Before that, I backed up my entire system partition to save on time to a tar.gz file. However, I've realized it's a pain to browse the tar.gz file and cherry-pick stuff from it, so I want to convert it to squashfs to simply be able to mount it.

Unfortunately, it seems like after a while, tar2sqfs encounters an issue, deletes the sqfs file it was making and exits: It seems like it's trying to process the "mnt" folder inside the tar file but fails, it's not clear to me what "File exists" would be referring to in this situation. Did it find multiple "mnt" entries inside the tar file?

Any suggestions on what to try?

richardweinberger commented 11 months ago

IMHO is a bug (missing feature) in tar2sqfs. Looks like your tarball contains multiple entries for the same file. While this is perfectly legal for tar, tar2sqfs seems to get confused by this.

AgentD commented 10 months ago

Yes, originally designed for mainframe tape drives, this is perfectly valid for tar. Entries can be overwritten by appending to the tape, with later entries overriding previous ones. Small entries could be deleted by overwriting with a block of zero-bytes (2k zero-bytes were used to mark the end of the tape).

This behavior is only partially implemented in tar2sqfs. As it reads the tarball, tar2sqfs file data is re-packed and an in-memory filesystem tree is built, which is serialized once the end-of-archive is reached. If an entry already exists in the tree, the addition fails with EEXIST.

The only case implemented are implicit directories, e.g. when "foo/bar" is created, but "foo" has not been seen yet, it is created as a directory with default ownership & permissions. It is flagged as implicit, and if "foo" is seen later on, the existing values are overwritten, and the flag is cleared (causing an error if it appears a 3rd time). This is to be independent of the packing order of the archive.

The assumption here was that, nowadays, tarballs are create-once files on disk and wouldn't require the full set of tape semantics. It is good to know that some backup programs still make use of this.

It would theoretically be possible to always overwrite the entries, but a real headache is what to do with already written file data, if it were to be erased (particularly fragment blocks).

richardweinberger commented 10 months ago

It would theoretically be possible to always overwrite the entries, but a real headache is what to do with already written file data, if it were to be erased (particularly fragment blocks).

Given this a second thought, adding support for this is not worth the hassle. But improving the error emitted by tar2sqfs will help users to understand why tar2sqfs aborts.