Open yaosheng-zhang opened 1 year ago
Please provide a few more details.
... I use antlr ...
What version of Antlr are you using?
OK, you are using the "Java" target.
For the c grammar, using Antlr 4.13.0, the CSharp (dotnet 7.0.305) target, on Ubuntu 20.04.6, on an AMD Ryzen 7 2700 Eight-Core Processor, 16GB DDR4, code.txt and cfile.txt both take each about 0.13 s. cfile.txt is 377 lines long, code.txt 50 lines (wc cfile.txt code.txt
). Neither of these is over 500 lines long.
I tried it on a 1k line file from the GCC testsuite (Wmisleading-indentation.c). Took about the same amount of time.
NB: pre-processor directives should be ignored, but it looks like the c grammar parses only two types of directives. That's wrong. https://github.com/antlr/grammars-v4/issues/3601
Updated the grammar for parsing preprocessor directives. https://github.com/antlr/grammars-v4/pull/3602
For the Java target, using "grouped parsing" (aka "warm up parsing"), these are the runtimes for each of the test files.
07/11-12:05:44 ~/issues/g4-3601/c/Generated-Java
$ bash run.sh ../examples/*.c
Java 0 ../examples/add.c success 0.038
Java 1 ../examples/BinaryDigit.c success 0.001
Java 2 ../examples/bt.c success 0.04
Java 3 ../examples/dialog.c success 0.002
Java 4 ../examples/FuncCallAsFuncArgument.c success 0.01
Java 5 ../examples/FuncCallwithVarArgs.c success 0.009
Java 6 ../examples/FuncForwardDeclaration.c success 0.002
Java 7 ../examples/FunctionCall.c success 0.003
Java 8 ../examples/FunctionPointer.c success 0.009
Java 9 ../examples/FunctionReturningPointer.c success 0.004
Java 10 ../examples/helloworld.c success 0.0
Java 11 ../examples/integrate.c success 0.013
Java 12 ../examples/ll.c success 0.002
Java 13 ../examples/ParameterOfPointerType.c success 0.001
Java 14 ../examples/pr403.c success 0.0
Java 15 ../examples/TypeCast.c success 0.007
Java 16 ../examples/Wmisleading-indentation.pp.c success 0.073
Total Time: 0.405
07/11-12:06:00 ~/issues/g4-3601/c/Generated-Java
OK. "code.txt" is your driver code for the Java target.
"cfile.txt" is NOT a C-language file. It's a C++ source file. For example, it contains a class declaration "class ImageServer". Classes do not exist in the C language. So, you are using the wrong grammar.
This cannot be parsed by c grammar. It's cpp grammar. Starting over........
$ tail -n +15 /c/Users/Kenne/Downloads/cfile.txt | head
STRICT_MODE_OFF
#include "json.hpp"
STRICT_MODE_ON
#include <iostream>
using namespace mavlink_utils;
using namespace mavlinkcom;
extern std::string replaceAll(std::string s, char toFind, char toReplace);
void UnitTests::RunAll(std::string comPort, int boardRate)
{
com_port_ = comPort;
07/11-12:27:04 ~/issues/g4-3601/cpp/Generated-Java
This input is C++ source code, and code that is before preprocessing. It cannot be parsed cleanly with the cpp grammar because the macro call STRICT_MODE_OFF
is not a C++ statement. The input should be the source code after preprocessing.
However, with the cpp grammar, the input is parsed with error, rather slowly.
$ bash run.sh /c/Users/Kenne/Downloads/cfile.txt
line 19:0 no viable alternative at input 'STRICT_MODE_OFF#include "json.hpp"\rSTRICT_MODE_ON#include <iostream>\rusing'
Java 0 C:/Users/Kenne/Downloads/cfile.txt fail 1.498
Total Time: 1.662
07/11-12:37:18 ~/issues/g4-3601/cpp/Generated-Java
... I use antlr ...
What version of Antlr are you using?
OK, you are using the "Java" target.
For the c grammar, using Antlr 4.13.0, the CSharp (dotnet 7.0.305) target, on Ubuntu 20.04.6, on an AMD Ryzen 7 2700 Eight-Core Processor, 16GB DDR4, code.txt and cfile.txt both take each about 0.13 s. cfile.txt is 377 lines long, code.txt 50 lines (
wc cfile.txt code.txt
). Neither of these is over 500 lines long.I tried it on a 1k line file from the GCC testsuite (Wmisleading-indentation.c). Took about the same amount of time.
NB: pre-processor directives should be ignored, but it looks like the c grammar parses only two types of directives. That's wrong. #3601
I'm using antlr 4.9 in maven, which means that if my .c file exceeds 500 lines it can't be parsed? I downloaded the c.g4 from the official antlr repository or do I need to preprocess the data myself? Is there a .g4 file that can parse both cpp and c?
Does antlr have a length limit for parsing c/cpp code? I'm using antlr to parse a 2000 line c code file, but the parser can only parse up to 500 lines, when I delete the first 500 lines it parses a few hundred lines. How to solve the length limitation?