elliotchance / c2go

⚖️ A tool for transpiling C to Go.
MIT License
2.09k stars 155 forks source link

panic: could not match regexp with string (translation_unit_decl) #885

Open zazola opened 3 years ago

zazola commented 3 years ago
$ c2go transpile ~/test9.c 
panic: could not match regexp with string
^(?P<address>[0-9a-fx]+) <(?P<position>.*)> <(?P<position2>.*)>[\s]*$
0x3228d60 <<invalid sloc>>

goroutine 21 [running]:
github.com/elliotchance/c2go/ast.groupsFromRegex(0xc000348000, 0x45, 0xc00015a014, 0x1a, 0x6ce701)
    /home/abrosaia/go/pkg/mod/github.com/elliotchance/c2go@v0.26.9/ast/ast.go:310 +0x36a
github.com/elliotchance/c2go/ast.parseTranslationUnitDecl(0xc00015a014, 0x1a, 0x13)
    /home/abrosaia/go/pkg/mod/github.com/elliotchance/c2go@v0.26.9/ast/translation_unit_decl.go:10 +0x4e
github.com/elliotchance/c2go/ast.Parse(0xc00015a000, 0x2e, 0x6cb230, 0x5)
    /home/abrosaia/go/pkg/mod/github.com/elliotchance/c2go@v0.26.9/ast/ast.go:258 +0x2ea5
main.convertLinesToNodes(0xc000166000, 0xa0, 0x283, 0x0, 0x0, 0x0)
    /home/abrosaia/go/pkg/mod/github.com/elliotchance/c2go@v0.26.9/main.go:89 +0x1b4
main.convertLinesToNodesParallel.func1.1(0xc0000ea4e0, 0xc00000e080, 0xc000166000, 0xa0, 0x283, 0x0)
    /home/abrosaia/go/pkg/mod/github.com/elliotchance/c2go@v0.26.9/main.go:113 +0x53
created by main.convertLinesToNodesParallel.func1
    /home/abrosaia/go/pkg/mod/github.com/elliotchance/c2go@v0.26.9/main.go:111 +0x12f
zazola commented 3 years ago

I see it's similar to #853 and #840, but I don't understand the code to be able to fix it

elliotchance commented 3 years ago

Yes, those are good example of similar fixes. I'll explain whats going on.

c2go calls clang to do the preprocessing, parsing and output a (rather unfriendly) text based representation of the AST of the source code. Having clang parse the source means that c2go doesn't have to have its own extremely complicated C parser and the clang output provides some extra insights (such as typing information) as well.

On the flip side, the text-based clang AST output is not designed to be machine ingested, it's (normally) just for debugging. So the format changes all the time. c2go relies on regexps to parse this back into a structure in memory before transpiling.

I decided to make c2go very strict about what it would accept so that nuanced cases would be caught rather than leading to bugs down the line. Of course, this also means that c2go can be very brittle as clang make changes to their parser.

The solution you see in those PRs is to copy the line that is failing (0x3228d60 <<invalid sloc>>) and add it as a unit test here. Then modify the regexp to match that input line that's currently failing.