Parsing error in formatter

saw235 commented 1 year ago

Test case

class test;
  function new(string name = "test");
    abcdefghijklmno.get_field_by_name("LA").set(Abcdefghijkl.i_l2uncache_config[0].base_addr[31:12]);
  endfunction
endclass

Include any options or configuration used.

Actual output

test: Error lex/parsing-ing formatted output.  Please file a bug.
First error: token: "endfunction" at 4:5-15:; problematic formatter output is
class test;
    function new(string name = "test");
        abcdefghijklmno.get_field_by_name("LA").set(
    endfunction
endclass

Note that, replacing the identifiers with different names automagically makes it work. So I have no clue what is happening here.

hzeller commented 1 year ago

Identifier name change confirmed: if the underscores are removed, the formatting works. Maybe some issue in the lexer in certain circumstances not accepting underscores in identifiers.

class test;
  function new(string name = "test");
    L2UCMTLBxLA_reg.get_field_by_name("LA").set(ihdasbiuconfig.il2uncacheconfig[0].baseaddr[31:12]);
  endfunction
endclass

hzeller commented 1 year ago

Printing the raw tokens however shows that tokens are properly lexed as full tokens, e.g. (#SymbolIdentifier @98-115: "i_hdas_biu_config")

So something else is going on.

Assigning to @mglb to have a first look and triage. (also CC @fangism )

verible-verilog-syntax --printrawtokens /tmp/a.sv 

All lexed tokens:
(#"class" @0-5: "class")
(#"<<space>>" @5-6: " ")
(#SymbolIdentifier @6-10: "test")
(#';' @10-11: ";")
(#"<<\\n>>" @11-12: "
")
(#"<<space>>" @12-14: "  ")
(#"function" @14-22: "function")
(#"<<space>>" @22-23: " ")
(#"new" @23-26: "new")
(#'(' @26-27: "(")
(#"string" @27-33: "string")
(#"<<space>>" @33-34: " ")
(#SymbolIdentifier @34-38: "name")
(#"<<space>>" @38-39: " ")
(#'=' @39-40: "=")
(#"<<space>>" @40-41: " ")
(#TK_StringLiteral @41-47: ""test"")
(#')' @47-48: ")")
(#';' @48-49: ";")
(#"<<\\n>>" @49-50: "
")
(#"<<space>>" @50-54: "    ")
(#SymbolIdentifier @54-69: "L2UCMTLBxLA_reg")
(#'.' @69-70: ".")
(#SymbolIdentifier @70-87: "get_field_by_name")
(#'(' @87-88: "(")
(#TK_StringLiteral @88-92: ""LA"")
(#')' @92-93: ")")
(#'.' @93-94: ".")
(#SymbolIdentifier @94-97: "set")
(#'(' @97-98: "(")
(#SymbolIdentifier @98-115: "i_hdas_biu_config")
(#'.' @115-116: ".")
(#SymbolIdentifier @116-134: "i_l2uncache_config")
(#'[' @134-135: "[")
(#TK_DecNumber @135-136: "0")
(#']' @136-137: "]")
(#'.' @137-138: ".")
(#SymbolIdentifier @138-147: "base_addr")
(#'[' @147-148: "[")
(#TK_DecNumber @148-150: "31")
(#':' @150-151: ":")
(#TK_DecNumber @151-153: "12")
(#']' @153-154: "]")
(#')' @154-155: ")")
(#';' @155-156: ";")
(#"<<\\n>>" @156-157: "
")
(#"<<space>>" @157-159: "  ")
(#"endfunction" @159-170: "endfunction")
(#"<<\\n>>" @170-171: "
")
(#"endclass" @171-179: "endclass")
(#"<<\\n>>" @179-180: "
")
(#"end of file" @180-180: "")

mglb commented 1 year ago

Identifier name change confirmed: if the underscores are removed, the formatting works. Maybe some issue in the lexer in certain circumstances not accepting underscores in identifiers.

The important change seems to be line length (under/over limit), not an identifier itself.

ok:

class test;
  function new(string name = "test");
    abcdefghijklmno.get_field_by_name("LA").set(Abcdefghijk.i_l2uncache_config[0].base_addr[31:12]);
  endfunction
endclass

failure:

class test;
  function new(string name = "test");
    abcdefghijklmno.get_field_by_name("LA").set(Abcdefghijkl.i_l2uncache_config[0].base_addr[31:12]);
  endfunction
endclass

Logs for the failing case:

[token_partition_tree.cc:786] ReshapeFittingSubpartitions, before:
{ (>>>>[abcdefghijklmno . get_field_by_name ( "LA" ) . set ( Abcdefghijkl . i_l2uncache_config [ 0 ] . base_addr [ 31 : 12 ] ) ;], policy: append-fitting-sub-partitions, (origin: "abcdefghijk...dr[31:12]);"))
  { (>>>>[abcdefghijklmno . get_field_by_name (], policy: fit-else-expand, (origin: "abcdefghijklmno")) }
  { (>>>>>>>>["LA"], policy: fit-else-expand, (origin: ""LA"")) }
  { (>>>>[) . set (], policy: uninitialized, (origin: ")")) }
  { (>>>>>>>>[Abcdefghijkl . i_l2uncache_config [ 0 ] . base_addr [ 31 : 12 ] ) ;], policy: fit-else-expand, (origin: "Abcdefghijk...addr[31:12]")) }
}

(... line wraps search ...)

[token_partition_tree.cc:889] ReshapeFittingSubpartitions, after:
{ (>>>>[abcdefghijklmno . get_field_by_name ( "LA" ) . set ( Abcdefghijkl . i_l2uncache_config [ 0 ] . base_addr [ 31 : 12 ] ) ;], policy: append-fitting-sub-partitions, (origin: "abcdefghijk...dr[31:12]);"))
  { (>>>>[abcdefghijklmno . get_field_by_name ( "LA" ) . set (], policy: fit-else-expand, (origin: "abcdefghijklmno"))
    { (>>>>[abcdefghijklmno . get_field_by_name (], policy: fit-else-expand, (origin: "abcdefghijklmno")) }
    { (>>>>["LA"], policy: fit-else-expand, (origin: ""LA"")) }
    { (>>>>[) . set (], policy: uninitialized, (origin: ")")) }
  }
}

The wrapped line is lost somewhere in ReshapeFittingSubpartitions.

saw235 commented 1 year ago

@mglb Can I take the AR to resolve this? How can I get started and where should I look?

mglb commented 1 year ago

@saw235 I assume you know how to build Verible and run it with logging. If not, see Formatter docs (especially Troubleshooting) and the README.

Comment above ReshapeFittingSubpartitions explains briefly what does the function expect. By looking closer at the log pasted in my previous comment, we can see the partition tree structure the function gets is wrong. It should get partitions with one call, but gets partitions with two calls joined together:

[token_partition_tree.cc:786] ReshapeFittingSubpartitions, before:
{ (>>>>[abcdefghijklmno . get_field_by_name ( "LA" ) . set ( Abcdefghijkl . i_l2uncache_config [ 0 ] . base_addr [ 31 : 12 ] ) ;], policy: append-fitting-sub-partitions, (origin: "abcdefghijk...dr[31:12]);"))
  { (>>>>[abcdefghijklmno . get_field_by_name (], policy: fit-else-expand, (origin: "abcdefghijklmno")) }
  { (>>>>>>>>["LA"], policy: fit-else-expand, (origin: ""LA"")) }
  { (>>>>[) . set (], policy: uninitialized, (origin: ")")) }
  { (>>>>>>>>[Abcdefghijkl . i_l2uncache_config [ 0 ] . base_addr [ 31 : 12 ] ) ;], policy: fit-else-expand, (origin: "Abcdefghijk...addr[31:12]")) }
}

To make it work with ReshapeTokenPartitions there should be two partition trees, one for abcdefghijklmno.get_field_by_name("LA"). and one for set(Abcdefghijk.i_l2uncache_config[0].base_addr[31:12]);.

Since ReshapeFittingSubpartitions is most probably not the culprint, we should look in places where partitions are created and reshaped:

Which cases apply to the code from this issue can be found in logs. Look for those two function names and you'll find what nodes they process and what partitions are reshaped. There are partition dumps similar to the one pasted above, so you should be able to find when things gets weird.

After finding out the place, you could try changing partition policy to PartitionPolicyEnum::kFitOnLineElseExpand from kAppendFittingSubPartitions (this is the one handled by ReshapeFittingSubpartitions). I recommend not to try too hard to fix this with kAppendFittingSubPartitions. There probably would be issues with nicely joining its results together.

Feel free to ask here if you have more questions.

Warning: the formatter runs twice internally to verify its results. This results in two almost identical chunks of lines in log output. The second run starts in the middle of the whole log output, so its pretty easy to remove or ignore during analysis.

chipsalliance / verible

Parsing error in formatter #1553