antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.12k stars 3.69k forks source link

Verilog grammar status #3028

Open quantrpeter opened 1 year ago

quantrpeter commented 1 year ago

Hi Mr @msagca How is the status of verilog grammar, is it stable now? thanks Peter

msagca commented 1 year ago

Hi @quantrpeter

I think the grammar is in a good state. Of course it can be further optimized and improved but it works fine enough. As a user of it what do you think?

quantrpeter commented 1 year ago

I was using the old version to build our tool, I remember last year there is a big change, grammar changed a lot, and listener stub are all changed. Now I want to use it to build a logic synthesizer, that's why I ask if it is stable. thanks for creating it Mr Msagca.

quantrpeter commented 1 year ago

I just download a very big verilog, over 10000 line and test your grammar, it successful parsed it. thanks for your great effort, i can start building the logic synthesizer now.

quantrpeter commented 1 year ago

when do we need to use VerilogPreParser?

msagca commented 1 year ago

when do we need to use VerilogPreParser?

It parses preprocessor directives. You need to handle them first before beginning the actual parse. If you're sure that the input text does not contain any directives that could impact the resulting parse tree then you can skip this step.

quantrpeter commented 1 year ago

thanks, if i put any text to parse for VerilogPreParser, it doesn't give me any error, may I know why please?

VerilogLexer lexer = new VerilogLexer(CharStreams.fromString("hello this is fake"));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
VerilogPreParser parser = new VerilogPreParser(tokenStream);
parser.source_text();
msagca commented 1 year ago

@quantrpeter

Tokens belonging to preprocessor directives are on the DIRECTIVES channel. You have to pass the correct channel ID (3) to CommonTokenStream.

And you may have to call the [fill()](https://www.antlr.org/api/Java/org/antlr/v4/runtime/BufferedTokenStream.html#fill()) method.

And this will still not give any errors because there is no token on the DIRECTIVES channel for the input hello this is fake.

quantrpeter commented 1 year ago
String str = IOUtils.toString(TestCompile.class.getClassLoader().getResourceAsStream("verilogexample/tb.v"), "utf-8");
VerilogLexer lexer = new VerilogLexer(CharStreams.fromString(str));
CommonTokenStream tokenStream = new CommonTokenStream(lexer, VerilogLexer.DIRECTIVES);
tokenStream.fill();
VerilogPreParser parser = new VerilogPreParser(tokenStream);
parser.source_text();

try to parse

module tb;
  reg clk, d, rstn;
  wire q;
  reg [3:0] delay;

  my_design u0 ( .clk(clk), .d(d),
`ifdef INCLUDE_RSTN asdasd asda s da
                .rstn(rstn),
`endif
                .q(q));

  always #10 clk = ~clk;

  initial begin

    {d, rstn, clk} = 0;

    #20 rstn = 1;

    for (integer i = 0 ; i < 20; i=i+1) begin
      delay = $random;
      #(delay) d = $random;
    end

    #20 $finish;
  end

endmodule

line 7 is fake, but no error message, any hints?

msagca commented 1 year ago

@quantrpeter

IEEE 1364-2005 spec does not explicitly state that ifdef_group_of_lines must begin on a new line; so, I didn't require it in the grammar. Anything that comes after INCLUDE_RSTN (text_macro_identifier) is a part of ifdef_group_of_lines.

ifdef_directive ::=
`ifdef text_macro_identifier
ifdef_group_of_lines
{ `elsif text_macro_identifier elsif_group_of_lines }
[ `else else_group_of_lines ]
`endif

Edit: I read the spec again and it says that these _group_of_lines are actually the lines following the directive. So, I have to fix this.

quantrpeter commented 1 year ago

nono, i typed fake text (after "INCLUDE_RSTN") in the ifdef, but it doesn't give me any error

`ifdef INCLUDE_RSTN asdasd asda s da
                .rstn(rstn),
`endif
msagca commented 1 year ago

@quantrpeter

If you mean this part: asdasd asda s da, then I explained why. Yes, it's not valid Verilog but we cannot detect this until we preprocess the input and parse it with VerilogParser.

quantrpeter commented 1 year ago

thanks @msagca , why in VerilogParser.g4 "expression" rule using '|' instead of VL, we still have ctx.VL() ?

Screenshot 2023-02-13 at 1 57 06 AM
msagca commented 1 year ago

@quantrpeter

They correspond to the same token. If you open the generated VerilogParser.tokens or VerilogLexer.tokens file you will see that VL and | both have the same token ID (178).

klcheungaj commented 1 year ago

hi @msagca I also have a question about SystemVerilog and Verilog's preprocessor grammar. Currently MACRO_USAGE and MACRO_NAME are defined as follows:

MACRO_NAME : IDENTIFIER ( [ \t\r\n]* MACRO_ARGS )? -> channel(DIRECTIVES), mode(MACRO_TEXT_MODE) ;
MACRO_USAGE : IDENTIFIER ( [ \t\r\n]* MACRO_ARGS )? -> channel(DIRECTIVES), popMode ;

but do we actually need \r and \n? If the source code before preprocessing is like this:


`define MACRO1   
(a);

(a) will be regarded as part of the MACRO_NAME, while ; becomes MACRO_TEXT. Both verilator and iverilog have following result:


(a);

Apart from that, I think it will be more convenient for users if macro identifier and macro arguments can be matched by different tokens rules so that we can have a parser rule to get the macro argument list.

msagca commented 1 year ago

@klcheungaj

You're totally right about the newline character. It terminates the macro and shouldn't be allowed there. I'll fix this. Edit: According to the spec, white space is allowed between macro name and argument list in macro usage but not allowed in macro definition. Edit: I've also noticed that I've allowed line comments inside compiler directives, which doesn't make any sense. I'll fix this too.

Separating macro identifier from macro arguments is tricky. I couldn't find a nice way of doing it while keeping the grammar target agnostic. I'll give it another try. Maybe you can suggest an alternative solution.

klcheungaj commented 1 year ago

You are right. I didn't notice them.

Sorry that I'm not good at parser. I can't think of a good way to implement it.