Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens

conventional-commits / parser

reference implementation of conventionalcommits.org spec

ISC License

46 stars 7 forks source link

Create <whitespace> and <newline>, (, ), etc nodes rather than throwing out tokens #16

Open bcoe opened 3 years ago

bcoe commented 3 years ago

So that there's no lost information in the parse tree, we should create <whitespace> and <newline> tokens for spacing characters that fall outside of <text> nodes.

wesleytodd commented 3 years ago

I realized after submitting that review that the pr was merged, so just to keep the context organized I figured I would re-post it here to see if we need to follow up:

I might be wrong here, but I think we need a <blankline> token which is <blankline> ::= <newline>, <newline>+ as the separators between the <summary>, <body>, & <footer>. If I read this correctly as it, you would have a valid summary/body as (only one LF):

fix: summary line
this is my body

Am I missing something here? The implementation looks right, but I think it is just miss-represented in the grammar spec.

https://github.com/conventional-commits/parser/pull/26#pullrequestreview-559300866

bcoe commented 3 years ago

Am I missing something here? The implementation looks right, but I think it is just miss-represented in the grammar spec.

@wesleytodd this was intentional on my part, I see no reason that:

fix: summary line
this is my body

shouldn't parse correctly, since it doesn't cause any trouble for the grammar.

bcoe commented 3 years ago

@byCedric @wesleytodd having been playing around a bit with a tree visitor, it jumps out at me that we'll probably also want to add (, ), and any other tokens I might be missing? to the tree.

This allows us to do something like visit(summary, () => true, (node) => { if (node.value) { text += node.value} }), to put a portion of the tree back together into raw text.

wesleytodd commented 3 years ago

this was intentional on my part

Not opposed to this, but it is a change (might be breaking even?) from the current spec. What does the current parser do in this case?

This allows us to do something like visit(summary, () => true, (node) => { if (node.value) { text += node.value} }), to put a portion of the tree back together into raw text.

This was my main concern with not having these. If you want to modify and stringify, these markers are very helpful to maintain exact formatting.

byCedric commented 3 years ago

@byCedric @wesleytodd having been playing around a bit with a tree visitor, it jumps out at me that we'll probably also want to add (, ), and any other tokens I might be missing? to the tree.

Yeah, it feels a bit odd that we remove those scopes. In this change I had to mitigate it by doing some stuff differently.

One thing we could add is something similar to nlcst's punctuation. Not sure if we have more than the scope parenthesis, but if we do we might want to consider that.

bcoe commented 3 years ago

Not opposed to this, but it is a change (might be breaking even?) from the current spec.

I think I've made a few breaking changes as we transcribe the grammar. I'm trying to channel all the things I've seen colleagues accidentally get wrong, and then reach out to me about.

Making the grammar as forgiving as possible, but then having a linter on top of it, I'd say is the ideal. We can have a linter indicate that it's recommended to have \n\n between the summary and body, while still parsing appropriately.

Yeah, it feels a bit odd that we remove those scopes. In this change I had to mitigate it by doing some stuff differently.

I'm happy with a Punctuation node, if it's only parenthesis we don't have in the tree right now, perhaps we could even just call it a Parenthesis node?