BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.21k stars 106 forks source link

Follows macOS/APFS "firmlinks" even with `.follow_links(false)` #169

Open hippietrail opened 1 year ago

hippietrail commented 1 year ago

macOS with APFS has a feature called "firmlinks" which are sometimes described as being between hardlinks and symlinks. They're used to make two system partitions appear like the old single partition scheme. Certain directories that live in /System/Volumes/Data/xyz are firmlinked to /xyz

Swift's standard library is aware of these and its dir walking functionality does not follow them. Rust's walkdir is not aware of them and does follow them. (Note that there's no commandline switches for mac's ls that reveal them)

I wrote similar code for both Swift and Rust. It's probably not the best, I'm just learning both languages. First argument is path the walk begins, second is a substring to match in the name of a directory to cause it to be printed out.

Rust: cargo run / LLVM

/Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
/System/Volumes/Data/Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
/System/Volumes/Data/Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
/System/Volumes/Data/Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
/System/Volumes/Data/Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
/Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
/Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
/Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
done

Swift: dirwalker / LLVM

Library/Frameworks/Xamarin.iOS.framework/Versions/15.10.0.5/LLVM
Users/hippietrail/.vscode-insiders/extensions/ms-vscode.cpptools-1.5.1/LLVM
Users/hippietrail/.vscode/extensions/ms-vscode.cpptools-1.12.4-darwin-arm64/LLVM
Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns/DTLLVMBinaryAnalysisPlugin.xrplugin
done

Rust code:

use std::env;
use walkdir::WalkDir;

fn main() {
    let args: Vec<String> = env::args().collect();

    let path: &str = args[1].as_str();
    let text: &str = args[2].as_str();

    if args.len() == 3 {
        for entry in WalkDir::new(path).follow_links(false)
            .into_iter()
            .filter_map(|e| e.ok())
            .filter(|e| e.file_type().is_dir())
            .filter(|e| e.file_name().to_str().unwrap().contains(text)) {

            println!("{}", entry.path().display());
        }
        println!("done");
    } else {
        println!("** usage: first arg is start directory, second is substring to look for in directory paths");
    }
}

Swift code:

import Foundation
import AppKit

let fileManager = FileManager.default

let resKeys : [URLResourceKey] = [.isDirectoryKey, .fileSizeKey, .isSymbolicLinkKey]

let startURL: URL = URL(string: fileManager.currentDirectoryPath)!

guard CommandLine.arguments.count == 3 else {
    print("** usage: dirwalker path string")
    exit(1);
}

let pathArg = CommandLine.arguments[1]
let matchArg = CommandLine.arguments[2]

if let path = pathArg.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) {
    if let url = URL(string: path) {
        let en = fileManager.enumerator(at: url,
                                        includingPropertiesForKeys: resKeys,
                                        options: [.producesRelativePathURLs],
                                        errorHandler: { (url, error) -> Bool in
            return true }
        )!

        mainloop: for case let fileURL as URL in en {
            do {
                let rv = try fileURL.resourceValues(forKeys: Set(resKeys))
                if let d = rv.isDirectory, d {

                    let filename: String = fileURL.lastPathComponent;

                    if filename.contains(matchArg) {
                        print(fileURL.relativePath)
                    }
                }
            } catch {
                print("** error 2:", error)
            }
        }
    }
}

print("done")
BurntSushi commented 1 year ago

I'm not sure what the expectation is here. If macOS doesn't report them as symlinks, then that's what macOS has decided: they shouldn't be regarded as symlinks.

hippietrail commented 1 year ago

Well the programmer normally has an expectation that walking a directory only visits each directory once. But perhaps my chosen wording naturally leaves to a literalist interpretation but even then the option itself talks about "links" not about "symlinks".

Perhaps a discussion would include alternatives such as:

BurntSushi commented 1 year ago

With respect to naming: follow_links is short for following symlinks. It doesn't change anything about how hardlinks are handled, for example. Note that the first three words for the docs for follow_links is:

Follow symbolic links.

So the docs are already clear that it's just about symlinks.

I'm definitely not going to rename it. And renaming it just because macOS decided to introduce some new weird version of links also seems like a bad way to prioritize things.

I'm inclined to:

BurntSushi commented 1 year ago

And also:

Well the programmer normally has an expectation that walking a directory only visits each directory once.

If that's true, then why doesn't macOS report firm links as symlinks? Like, why pin the responsibility on me here and not on macOS? They introduced firmlinks and they decided not to report them as symlinks.

zeroflaw commented 7 months ago

Out of curiosity I wanted to know how you would go about detecting a 'firmlink', seems possible using libc. You would probably want to follow the 'firmlink' (the short version) and ignore a directory that had a 'firmlink'. It's super confusing but it is doable.

#[cfg(test)]
mod tests {

    #[test]
    fn test_libc_detect_firmlink() {
        let app_path_system = std::ffi::CString::new("/System/Volumes/Data/Applications").unwrap();
        let app_path_root = std::ffi::CString::new("/Applications").unwrap();

        let fd_system = unsafe {
            libc::open(
                app_path_system.as_ptr(),
                libc::O_NONBLOCK,
                libc::O_DIRECTORY,
            )
        };
        let fd_root = unsafe {
            libc::open(
                app_path_system.as_ptr(),
                libc::O_NONBLOCK,
                libc::O_DIRECTORY,
            )
        };
        assert!(fd_system != -1);
        assert!(fd_root != -1);

        let mut buffer = vec![0; libc::PATH_MAX as usize];
        let r = unsafe { libc::fcntl(fd_system, libc::F_GETPATH, buffer.as_mut_ptr()) };
        assert!(r == 0);
        let get_path_system = std::ffi::CStr::from_bytes_until_nul(&buffer).unwrap();
        println!("(F_GETPATH) -- fd_system: {:?}", get_path_system);

        let mut buffer = vec![0; libc::PATH_MAX as usize];
        let r = unsafe { libc::fcntl(fd_root, libc::F_GETPATH, buffer.as_mut_ptr()) };
        assert!(r == 0);
        let get_path_root = std::ffi::CStr::from_bytes_until_nul(&buffer).unwrap();
        println!("(F_GETPATH) -- fd_root: {:?}", get_path_root);

        let mut buffer = vec![0; libc::PATH_MAX as usize];
        let r = unsafe { libc::fcntl(fd_system, libc::F_GETPATH_NOFIRMLINK, buffer.as_mut_ptr()) };
        assert!(r == 0);
        let get_path_nofirmlink_system = std::ffi::CStr::from_bytes_until_nul(&buffer).unwrap();
        println!(
            "(F_GETPATH_NOFIRMLINK) -- fd_system: {:?}",
            get_path_nofirmlink_system
        );

        let mut buffer = vec![0; libc::PATH_MAX as usize];
        let r = unsafe { libc::fcntl(fd_root, libc::F_GETPATH_NOFIRMLINK, buffer.as_mut_ptr()) };
        assert!(r == 0);
        let get_path_nofirmlink_root = std::ffi::CStr::from_bytes_until_nul(&buffer).unwrap();
        println!(
            "(F_GETPATH_NOFIRMLINK) -- fd_root: {:?}",
            get_path_nofirmlink_root
        );

        println!(
            "path: {:?} is a firmlink: {}",
            app_path_root,
            app_path_root.as_c_str() != get_path_nofirmlink_root
        );
        println!(
            "path: {:?} is a firmlink: {}",
            app_path_system,
            app_path_system.as_c_str() != get_path_nofirmlink_root
        );
    }
}

output:

(F_GETPATH) -- fd_system: "/Applications"
(F_GETPATH) -- fd_root: "/Applications"
(F_GETPATH_NOFIRMLINK) -- fd_system: "/System/Volumes/Data/Applications"
(F_GETPATH_NOFIRMLINK) -- fd_root: "/System/Volumes/Data/Applications"
path: "/Applications" is a firmlink: true
path: "/System/Volumes/Data/Applications" is a firmlink: false
hippietrail commented 1 week ago

In case anyone is looking for info on detecting firmlinks in Darwin/macOS, apparently the only official way to do it is to use getattrlistbulk() to get ATTR_CMN_FLAGS and check if SF_FIRMLINK is set.