rdsquashfs feature suggestion: hardlink duplicate files on extract

tl;dr: I have a squashfs file with millions of duplicated files in them, it would be awesome to be able to extract the image and hardlink (or reflink) the duplicated files

My specific use case is an abuse of the intended functionality of squashfs, but I have been using squashfs as a directory archival tool to consolidate dozens of Apple Time Machine backup folders [1]. Time Machine uses directory hardlinks to snapshot the entire filesystems and preserve space, but I have Time Machine backups from different drives and systems which don't share those hardlinks but have very similar files. mksquashfs has been the only tool that's been able to scale to the number of files and hardlinks that I'm dealing with and properly do deduplication as I append directories to my single squashfs file.

I can always mount the squashfs image and browse to the specific files/folders I want to retrieve, but I was thinking it would be cool to be able extract the image and use the deduplication table to create files on the disk as hardlinks or reflinks on COW filesystems such as BTRFS. I'm not sure how hard this would be to implement in rdsquashfs to do so.

[1] There are pitfalls with using mksquashfs on Apple Time Machine folders. Namely, squashfs does not support all the crazy xattr stuff that macOS applies to files, so some things don't restore completely, but as a file archive, it works fine.

AgentD / squashfs-tools-ng

rdsquashfs feature suggestion: hardlink duplicate files on extract #73