Open yash745-deloitte opened 12 months ago
GetText() gets the text of the tree node that is in the parse tree. Off-channel tokens, or chars that are in the input char buffer that aren't in a token, are not added to the text that GetText() reconstructs. So, all tokens that are channel HIDDEN (=2) don't appear in the parse tree, and don't appear in the reconstructed text. Read the code for GetText() in the runtime. That reconstructs the text for the tree node by just recursively calling the method for all children, and returning the concatenation of that text--and not the off-channel tokens.
You will need to write your own code to reconstruct the text you want. This is easy because you can get the location of the index of the token for the extreme left and right leaf nodes of the tree. Then, you could get the text one of two ways:
(a) Write a for-loop to go through each token between start and end to print out the text of the token on the token stream corresponding to the text of the tree, including channel HIDDEN. (b) For some grammars, "skip" is used. These don't appear whatsoever on the token stream, so you will need to work with the char buffer itself. But, you don't have that problem with this grammar, and most grammars in grammars-v4 have been adjusted to not use "skip".
So, this isn't a bug with Antlr, nor the C-grammar. But, you're not the only one that "discovered" this problem. This is one of the things I don't care for in Antlr. In my Antlr Toolkit Trash, I rewrite all the trees to include "off-channel" text in the tree. It allows for a cleaner way to query the tree using XPath expressions, and modifications using XQuery.
Hi all,
I have been trying to extract C code using this C Grammar.
But facing issue in whitespace part. Whitespace is missing in the extracted code. The sample extracted code is given below:-
Code:-
Imports: []Variables: ['inta=5;', 'structPerson{charname[50];intage;floatheight;};', 'intx=10;']
Functions: ['main()', 'car()']
Function Implementations: ['intmain(){intx=10;printf("Hello, world!");return0;}', 'voidcar(){printf("Chain kuli ki man kuli");}']
Struct Declarations: ['structPerson{charname[50];intage;floatheight;}']
The python code which I'm using for above extraction is given below:-
` from antlr4 import * from cGrammarListener import cGrammarListener from cGrammarParser import cGrammarParser from cGrammarLexer import cGrammarLexer
class CDetailsListener(cGrammarListener): def init(self): self.imports = [] self.variables = [] self.functions = [] self.function_implementations = [] self.struct_variable = []
def extract_details_from_c_code(): lexer = cGrammarLexer(FileStream("C/sample.c")) stream = CommonTokenStream(lexer) parser = cGrammarParser(stream)
Example usage
imports, variables, functions, function_implementations, struct_variable = extract_details_from_c_code()
print("Imports:") print(imports)
print("\nVariables:") print(variables)
print("\nFunctions:") print(functions)
print("\nFunction Implementations:") print(function_implementations)
print("\nStruct Declarations:") print(struct_variable) `
If anyone has the workaround on how to resolve this issue, please response. It will be a great help. If any further queries or doubts, please feel free to ask.