bootandy / dust

A more intuitive version of du in rust
Apache License 2.0
8.91k stars 189 forks source link

`sort_by_inode` violates `Ord` and does not guarantee a total ordering leading to panic #407

Closed d0nutptr closed 4 months ago

d0nutptr commented 4 months ago

Hello!

I happened to run into an interesting edge-case while cleaning up a few projects. While running dust in a folder containing both files and symlinks I encountered this odd panic.

image

thread 'main' panicked at library/core/src/slice/sort/shared/smallsort.rs:848:5:
Ord violation
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I debugged the issue and discovered the issue is caused by sort_by_inode in dir_walker.rs. The problem is that the implementation provided does not guarantee a total ordering if both files and symlinks are present. For example:

Assume the folder contains the following nodes with names and inode_devices:

/a - Some(1)
/b - None
/c - Some(0)

There does not exist a total order as there exists no way to order these three files in a way where the following comparisons are satisfied:

* a < b
    * based on name as `b` has no inode device
* a > c
    * based on inode device
* b < c
    * based on name as `b` has no inode device 

I tried to make as small of a testcase as possible, but it was actually really hard to trigger the behavior. This is the smallest test I could create that exhibited the crash:

    #[test]
    fn test_total_ordering_of_sort_by_inode() {
        let mut children = Vec::new();

        children.push(Node {
            name: PathBuf::from_str("a").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19050027, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("b").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19050112, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("c").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19050113, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("d").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19047065, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("e").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19047066, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("f").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048323, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("g").unwrap(),
            size: 0,
            children: vec![],
            inode_device: None,
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("h").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048308, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("i").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048309, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("j").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048302, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("k").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048303, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("l").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19049996, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("m").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048314, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("n").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048315, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("o").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19048318, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("p").unwrap(),
            size: 0,
            children: vec![],
            inode_device: None,
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("q").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19049674, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("r").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19046994, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("s").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19046995, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("t").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19047666, 66310)),
            depth: 0,
        });

        children.push(Node {
            name: PathBuf::from_str("u").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((19050145, 66310)),
            depth: 0,
        });

        children.sort_by(sort_by_inode);
    }
bootandy commented 4 months ago

That's interesting. I'll look into this.

Just trying that test and set it to run in a loop, I haven't seen it fail yet. I'm using ubuntu, which OS did you use ? & Does it take several runs to fail ?

d0nutptr commented 4 months ago

Hey! Yea it can take a few goes in practice to crash.

It is exceedingly rare to find a situation where this happens in practice.

I fuzzed millions of configurations of files to see if i could make a smaller poc to send and came up with nothing.

If you run the above provided test case, though, it should be 100% reproducible.

d0nutptr commented 4 months ago

Oh, and yes I use ubuntu :)

bootandy commented 4 months ago

I see how this could happen. So this is genuine. (I just can't yet reproduce it). Going to have to think about this.

We could catch the panic and sort by name if that happens.

bootandy commented 4 months ago

This is helping me think about it: I codifed your description in the issue:

    #[test]
    fn test_total_ordering_of_sort_by_inode() {
        let a = Node {
            name: PathBuf::from_str("a").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((3, 66310)),
            depth: 0,
        };

        let b = Node {
            name: PathBuf::from_str("b").unwrap(),
            size: 0,
            children: vec![],
            inode_device: None,
            depth: 0,
        };

        let c = Node {
            name: PathBuf::from_str("c").unwrap(),
            size: 0,
            children: vec![],
            inode_device: Some((1, 66310)),
            depth: 0,
        };

        assert_eq!(sort_by_inode(&a, &b), Ordering::Less); //a < b
        assert_eq!(sort_by_inode(&a, &c), Ordering::Greater); // a > c
        assert_eq!(sort_by_inode(&c, &b), Ordering::Greater); // c > b
        // Transitivity fail

        assert_eq!(sort_by_inode(&c, &a), Ordering::Less);
        assert_eq!(sort_by_inode(&a, &c), Ordering::Greater);
        assert_eq!(sort_by_inode(&c, &a), Ordering::Less);
    }
bootandy commented 4 months ago

https://github.com/bootandy/dust/pull/408

does this look like it would fix it ?

d0nutptr commented 4 months ago

I think it would!

I wrote a simple hardness to formally verify the Ord property so let me try this new implementation to see if it can find a violation.

d0nutptr commented 4 months ago

I assume I wrote a relatively reasonable definition to check total order here, but it looks like it verified! :ship: it

image