benkay86 / nom-tutorial

Tutorial for parsing with nom 5.
288 stars 17 forks source link

updates for WSL support #5

Open boneskull opened 4 years ago

boneskull commented 4 years ago

Hi,

I found this tutorial while looking for nom tutorials. And Rust tutorials. This one seems to be the most up-to-date and well-written, and is essentially my first try at Rust. So thank you!

I had tried the tutorial on my WSL2 install (Ubuntu) on my Windows machine. But the parser fails at runtime, because of this weirdness in /proc/mounts (pasted verbatim):

C:\134 /mnt/c 9p rw,dirsync,noatime,aname=drvfs;path=C:\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=65536,trans=fd,rfd=8,wfd=8 0 0

You may understand why this fails at first glance, but I didn't, so I needed to figure it out (which makes for an even better tutorial, IMO).

This is how I solved it.


Like our escaped_space, we need a new function to handle \134 (it's a backslash, which is totally not confusing at all). I think it should display as just C:\:

  fn windows_backslash(i: &str) -> nom::IResult<&str, &str> {
    nom::combinator::value("\\", nom::bytes::complete::tag("134"))(i)
  }

I added this to the tuple passed into nom::branch::alt():

      nom::branch::alt((
        escaped_backslash,
        windows_backslash,
        escaped_space
      )),

This seemed to work OK, until I had to parse the mount options, which failed because \; is invalid (see paste from /proc/mounts above). I added yet another parser, and added this to the nom::branch::alt call:

  fn windows_options_backslash(i: &str) -> nom::IResult<&str, &str> {
    nom::combinator::value("\\;", nom::bytes::complete::tag(";"))(i)
  }

and

      nom::branch::alt((
        escaped_backslash,
        windows_backslash,
        escaped_space,
        windows_options_backslash,
      )),

(I was not sure what to name either of these functions.)

At any rate, the tests I've written against the /proc/mounts entry pass (but I haven't had a chance to actually run it on my windows box yet). Here's the one for parse_line() with different data:

  #[test]
    fn test_parse_line_wsl2() {
      let mount3 = Mount {
        device: "C:\\".to_string(),
        mount_point: "/mnt/c".to_string(),
        file_system_type: "9p".to_string(),
        options: vec![
          "rw".to_string(),
          "dirsync".to_string(),
          "noatime".to_string(),
          "aname=drvfs;path=C:\\;uid=1000;gid=1000;symlinkroot=/mnt/".to_string(),
          "mmap".to_string(),
          "access=client".to_string(),
          "msize=65536".to_string(),
          "trans=fd".to_string(),
          "rfd=8".to_string(),
          "wfd=8".to_string(),
        ],
      };
      let (_, mount4) =
        parse_line("C:\\134 /mnt/c 9p rw,dirsync,noatime,aname=drvfs;path=C:\\;uid=1000;gid=1000;symlinkroot=/mnt/,mmap,access=client,msize=65536,trans=fd,rfd=8,wfd=8 0 0").unwrap();
      assert_eq!(mount3.device, mount4.device);
      assert_eq!(mount3.mount_point, mount4.mount_point);
      assert_eq!(mount3.file_system_type, mount4.file_system_type);
      assert_eq!(mount3.options, mount4.options);

Note: I found the following causes the my test to break, because it's not returning the correct type of Err result (it expects one from tag(), not char(), just like test_escaped_space()):

fn windows_options_backslash(i: &str) -> nom::IResult<&str, &str> {
  value("\\;", char(';'))(i)
}

The compiler didn't complain about this, which I found unusual, since it usually complains about everything. Assuming we're not deleting this parser, what would you have done? Update the unit test, write a new trait, etc.? I don't know what's idiomatic (yet).

I'd love if you could show how you may have tackled this problem. If you like, I can send a PR with changes for this environment, and we could also discuss the implementation that way. Or not!

Anyway, thanks again for this tutorial.

benkay86 commented 4 years ago

I am by no means an expert in what is the most idiomatic use of nom, but I'll give you my 2¢.

It appears that under Windows Subsystem for Linux the output of /proc/mounts is close to, but not quite the same as, that of a typical Linux system with two major difference:

  1. Backslash is escaped as \134. I think your strategy of creating an additional windows_backslash combinator similar to the escaped_space combinator and adding it to the nom::branch::alt in escaped_transform is a good strategy.

  2. It looks like on WSL the mount options are separated by semicolons ; instead of commas ,. Rather than creating another escape sequence for semicolons, it would probably make more sense to modify the mount_opts parser to separate mount options by commas or semicolons. This could be done by replacing nom::multi::separated_list(nom::character::complete::char(','), ...) with nom::multi::separated_list(nom::character::complete::one_of(",;"), ...).

Hope this helps!

boneskull commented 4 years ago

@benkay86 Thanks for the feedback. I guess I didn't realize the semicolon-delimited stuff was proper mount options (as /bin/mount would recognize them), and thought they were just some weird windows-specific bunkum. but I suppose gid and uid are valid options!

Would you like me to send a PR? I'm probably not going to be the only person using WSL to go through this tutorial.

benkay86 commented 4 years ago

If you would like to send a PR I would be happy to test it on my native Linux system too.