Sysmagine / SemanticDiff

Community support for SemanticDiff, the programming language aware diff for Visual Studio Code and GitHub.
https://semanticdiff.com
45 stars 0 forks source link

Support for C/C++ languages #3

Open jornj opened 1 year ago

jornj commented 1 year ago

'Support for C and C++ languages Refactoring code sometimes moves code quite a bit around, and a semantically aware tool for diffs could be very useful.

slackner commented 1 year ago

Thank you for your feature request.

We have already thought about supporting C / C++ ourselves. In particular C seems quite difficult to implement though, because you often use macros, and thus a pre-processor step is necessary before the source code can be parsed. This would mean it's not just a diff between two files, but you also need all header files (including system headers) to generate the diff. So far we haven't come up with a good solution yet. We will keep you updated in this issue. :)

Regarding your use case, is it a C or C++ project, and in particular, how relevant is support for macro expansion to ensure proper parsing?

jornj commented 1 year ago

It is mostly C projects, but some use frameworks that are define-heavy.

I was hoping that you can just treat the macros as 'yet another function call' and avoid expanding them, although I see some issues regarding #ifdefs.

slackner commented 1 year ago

In some cases just treating them as function calls works. But not if there is too much template define magic. E.g.,

LIST_FOR_EACH(cursor, &list)
{
    // do something with cursor
}

This wouldn't be valid syntax without the #define to map it to a for loop.

jornj commented 1 year ago

Well.. if you don't go looking for the semi-colon, this is valid syntax. It's a line, possibly with a syntax error and then a scope.

The diff tool will quite often see just this, moving from broken code to fixed code.

prapin commented 1 year ago

I would love support for C++ in SemanticDiff! C++ has a much more complex syntax that C, with templates all over, but the typical use of macros is much lower, so maybe it would not be that much of a problem than in old C. In addition to special macros, there are #if, #ifdef etc. that are challenging for a semantics parsing. My proposal would be:

mmueller2012 commented 1 year ago

@prapin Thanks for your idea. The main reason we haven't implemented C++ support yet is that no generic parser framework can parse the language correctly. As explained in this StackOverflow answer, macros aren't the only issue. C and C++ are ambiguous in various ways and you can't disambiguate these cases without tracking variable and type declarations. So you either need a specialized parser that does this on the fly or one that creates all possible parsing trees and the correct one needs to be selected in a post processing step. To make matters worse, you can't do this reliably without also parsing all header files and implementing the preprocessor.

We thought about using Clang or GCC directly to get correct AST trees, but that would require a fully functional build environment. This might work to some extent with the VS Code extension, but wouldn't be an option for our GitHub App. As you pointed out yourself, it would still be tricky to handle code parts that get removed by the preprocessor. The build environment might also not be compatible with the displayed diff.

The other option would be to use a best effort approach instead and accept that some parse results will be incorrect (and maybe relax the grammar rules to handle macros better). This could lead to actual changes being classified as an invariance which is something we try very hard to avoid.

Personally, I don't think either approach is really great, but I would love to hear your opinions :-).

cmorty commented 1 year ago

I'd really like this. I don't think that there is an alternative to a compiler parsing the code, especially with C++. Even Eclipse/CDT is switching to clangd, because templates got to difficult to maintain (For me their killer-feature is supporting build-system- and macro-aware code completion).

As for Github: As we are all developers, I don't think this needs to work out of the box, without user support. An option would be to generate the necessary information for the diff in the CI, generate an artifact and then use that.

dewilcox commented 2 months ago

I'm not sure the best way to upvote this enhancement request, but I certainly use C++ a lot and a semantic diff would be fantastic. (Yes I'm sure clangd or similar would be required)