dthul / matfile

A library for reading and writing Matlab ".mat" data files
MIT License
13 stars 7 forks source link

minimize nom macros #3

Open benkay86 opened 4 years ago

benkay86 commented 4 years ago

With nom 5.0 the author, Geoffroy, has soft-deprecated macros in nom. Although macros used to be needed to write concise parsers with nom, as Rust has evolved it is now possible to write equally concise parsers without macros. The advantages of minimizing macro use include more comprehensible error messages and less of a learning curve for newcomers since they do not have to learn and understand nom's niche macro conventions.

I'm interested in contributing to matfile and would prefer to write code using nom methods instead of nom macros. I realize that as the library author you may have strong preferences about this. I've rewritten parse_header() as an example of what parsing would look like with less reliance on nom macros. (I also removed panics and removed the assumption that the host machine is little endian.) Of course, the rewritten version still passes cargo test.

Please let me know what you think. I would be happy to work on the rest of the nom parsing code, but I don't want to waste time if you hate the macro-less syntax.

dthul commented 4 years ago

I think it would be great to move to idiomatic nom 5 code! I never liked the macro approach much, since it makes debugging hard and control flow unwieldy to express. So I'm definitely up for replacing the macros with functions.

I took a brief look at your example snippet. I'm not familiar with the new nom yet, but it looks like you can get rid of the try_parse! invocations and instead just call the parse function (so take(8usize)(i)? instead of try_parse!(i, take(8usize))). The endianness check seems to have gotten more complicated than the original version (and is now by itself a single massive nested function call spanning almost 30 lines). I believe that this can be improved. Overall I like what I see! Thanks for bringing it up! If you want to work on this, please go ahead and I am certain we will be able to merge your changes.

benkay86 commented 4 years ago

Unfortunately try_parse! is still a necessary evil. Ideally you could write:

let (i, o) = some_nom_parser(i)?;

However, for better or worse, nom's error type in IResult does not implement std::error::Error therefore you can't use the ? operator on an IResult. (There is an unstable feature std::ops::Try that gets around this, but it requires compiling with nightly.)

Instead we have the try_result! macro, which expands to something like:

let (i, o) = match some_nom_parser(i) {
    Ok(i, o) => (i, o),
    Err(e) => return Err(e)
};

While I'm obviously in favor of minimizing macro use, try_result! is a simple pattern expansion that won't cause too much trouble with debugging and definitely improves readability of code. The other alternative would be to wrap the entire parser in a nom::sequence::tuple, but that gets hard to read for long sequences.

I updated the branch to lift the logic for endianness conversion out of parse_header(). Unfortunately it doesn't make the endianness check much shorter, but ultimately this is limited by MathWorks's decision to put the u16 version before the endianness indicator in the header instead of after it. If it still bothers you I can try and break out the logic into multiple subparsers.

dthul commented 4 years ago

Ideally you could write:

let (i, o) = some_nom_parser(i)?;

That does indeed work, no? As a quick test I compiled your branch and then replaced try_parse!(i, take(8usize)) in src/parse.rs:171 with take(8usize)(i)?. It still compiles.

I think your endianness check got a little more complicated since it introduced a bug(?). le_u16 reads a little endian encoded u16 and already takes care of byte-swapping, so the only thing we need to do is call swap_bytes if the initial assumption of it being a little endian encoded value turned out to be wrong. That only depends on whether the file is little or big endian, it does not depend on whether the host itself is little or big endian.

Based on your changes I pushed another version: https://github.com/dthul/matfile/compare/nom_no_macros#diff-258c0555e6f9c4bc4fba3a551d81638aL117

benkay86 commented 4 years ago

You are right, I was confused about the limitations on ?. I cannot ? nom parsers in a function that returns a dyn Error, but it is perfectly legal to ? nom parsers in a function that returns a nom::IResult. The try_parse! macro is not needed.

Thank you for pointing out my bug with regard to endian byte swapping. I see your way of parsing/verifying the version and endianness uses about the same number of lines, but it is easier to follow because it is broken up into multiple statements rather than one big nested parser.

I will work on converting the remaining methods in parse.rs to macro-less nom and send you a pull request when finished. I will try to stick with the style of splitting nested parsers into multiple statements for readability.