Gather data on Extended Attributes and Alternate Data Streams/File Forks

Most operating systems (Linux, macOS, and even Windows) support attaching key value pairs to files and folders via Extended Attributes; the data in these is not included in the file hash, and can theoretically be basically an entire file worth of content depending on the OS (max size of value can be something like 4KB up to the max size of a regular file). We should try to capture identifying information on the extended attributes that are present, and in some cases capture the information for helping identify a file (often web browsers will add the URL a file was downloaded from as an extended attribute).

Some operating systems also support alternate data streams (Windows) or file/resource forks (macOS kinda, and certain file systems like zfs for BSD/Linux, potentially Solaris). These can be entirely separate "hidden" files that are attached to a file, often with no limit on the maximum size of the data -- and the file hash we capture doesn't include any of this information. We should check files for the presence of these alternate data streams, and capture hashes. The contents may also be interesting (e.g. Windows web browsers storing the URL a downloaded file came from).

The trickiest bit is that this information often is not preserved when moving between file systems/OSes. Some detection logic to see if e.g. a tar file stores extended attributes would be interesting to warn the user creating an SBOM that they might be missing out on capturing some information would likely be useful. Testing of different archive/fs formats (tar, squashfs, 7z, zip, etc) to see what can preserve these things in the archive, and which OSes the information can be extracted on would be a good idea.

On the subject of "hidden" files, at Black Hat Asia 2024, there was this interesting talk: https://www.blackhat.com/asia-24/briefings/schedule/#magicdot-a-hackers-magic-show-of-disappearing-dots-and-spaces-36561

Essentially Windows does a conversion from paths such as C:\Windows\etc to an NtPath in the form \??\C:\Windows\etc for file operations. However, to maintain backwards compatibility it has some interesting behavior: removes trailing .'s from file names and remove trailing <space> from the end of the last path element. This leads to weird things like deleting a.txt. actually deleting a file named a.txt, and using shortnames (another backwards compatibility feature) it is possible to target files with a completely different name. Hidden files can also be created in zip files using this same file naming trick on Windows. Coupled with new-ish support for symlinks things can get interesting.

LLNL / Surfactant

Gather data on Extended Attributes and Alternate Data Streams/File Forks #180