comby-tools / comby

A code rewrite tool for structural search and replace that supports ~every language.
https://comby.dev
Apache License 2.0
2.42k stars 62 forks source link

C language is including ';' in expression syntax #380

Open charles-gray opened 7 months ago

charles-gray commented 7 months ago

Describe the bug When I try to use a hole with expression-like syntax (:[foo:e]) in C code, I see the ; semicolon character included in the expression. ; isn't a valid expression token in C.

Reproducing

This link showcases some examples of it matching the ; and some ways to break it.

bit.ly/3UNAuki

Expected behavior I expect to be able to match an expression without the trailing semicolon.

Additional context The same is true for the comma (,) token. Though that can be part of an expression depending on context. I'm not sure if I expect comby to be smart enough to tell the difference, so I'm not sure that's included in this bug report.

rvantonder commented 6 months ago

Hi @charles-gray! The explanation is that this really is expression-like and not strictly that expected C-expression matching. Comby is not smart enough to tell the difference. The examples that "break" that matching are what are considered non-expression tokens (spaces at the top-level, i.e., not inside (...), and comments)

Note that in many languages (and I think C is included here), a syntactic statement ending in a semicolon is considered an expression. So in the strictly C-expression matching of your examples, I would expect the behavior to always match the trailing ; (rather than never, if I am following what you would expect).

As a workaround, you can look at stripping or fine-tuning matching the ; with a regular expression matcher, since this is probably a lexical concern most of the time.

Feel free to close if this answers your question :-)

charles-gray commented 6 months ago

Thanks for the prompt response!

I've always assumed the ; was part of a statement, not an expression. Grabbing the first google result for a C grammar I can read, the use case I'm looking at falls under an "expression-statement", so I guess we're both right.

I guess my question then is, I see that comby supports custom language definitions. I'd love to tweak the C definition to see if I can bend it to my current use case (I've encountered this semicolon problem before!). The C definition in the comby source seems to be hard-coded in ML. I was wondering if there's a way to spit out the definition for C in JSON, or there's a reference example somewhere I can adapt?