LeopoldArkham / Molten

[WIP] Molten - Style-preserving TOML parser.
Apache License 2.0
40 stars 8 forks source link

Merge consecutive whitespace & correctly handle indentation and trails #37

Closed jeremydavis519 closed 6 years ago

jeremydavis519 commented 6 years ago

This PR does two things. First, it fixes #21 by, after an item is parsed, determining whether both it and the previous one were Item::WS instances. If so, it replaces the old instance instead of appending the new one. It uses only safe code, doesn't allocate anything extra, and doesn't require a second pass. I've also added two test cases, both to the reconstruction module because this change doesn't change the reproduction side at all.

The second thing I've done in this PR is fix a bug that I discovered while running my test cases. Any indentation before a key was being either prepended to the key or appended to the trail of the line before it. As it turned out, there were two places in parser.rs where self.idx was being moved back to an earlier character but one or both of self.chars and self.current were not. Since those should always be in sync, I added the methods save_idx and restore_idx, which save and update all three variables together. I chose the suffix "idx" because that variable seems the most fundamental to me: it is effectively the iterator, and cursor and chars exist to give information about its surroundings.

Note that in order for the tests to show anything useful for this PR, I had to uncomment the line in reconstruction.rs that says assert_eq!(parsed, under_test);.Otherwise, it would have just been a string comparison with the source file's contents, and my change doesn't affect that string--just how it's split up internally. As a result, the AoTs test is now failing. I haven't looked at it in depth, but it looks like the nesting is different than expected. Let me know if that wasn't a known issue and I'll put it in the tracker.

jeremydavis519 commented 6 years ago

The AoTs were just not including their first elements. I added a check for an AoT in the main parse loop, so now the AoTs test passes even when it actually compares the ASTs.

Also, in order to get the test to pass, I realized that I had to explicitly implement Hash for the Key type. The derived hashing didn't match the custom PartialEq implementation, so most de facto equality checks (i.e. seeing if a given key was already in the HashMap) depended on the whole Key structure instead of only its key member.

LeopoldArkham commented 6 years ago

Hey, thanks for doing this!

Before merging, I'd like to ask you to move the nested if let conditions into some kind of consecutive_ws(body, value) function to minimize repetition and rightwards drift, especially since parse_table() is already a heavy function.

You could also have that function handle the code inside the condition as well and return true if combination happened. This lets us lose the combined flag and use it like:

if !merge_ws(body, value) {
    body.append(key, value).chain_err(|| self.parse_error())?;
}

It does mean having side effects in a condition evaluation, but it's very clean looking and if documented adequately, shouldn't be a problem. We already do this with inc().

jeremydavis519 commented 6 years ago

Good point. The code duplication was bugging me a little anyway, so I'm glad you pointed it out.

By the way, assuming I haven't made any logic errors, my PR for #38 will make parse_table much simpler, since I'm moving the logic for handling nested tables elsewhere.

LeopoldArkham commented 6 years ago

Ok this looks great now, thanks a lot!