github-linguist / linguist

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
MIT License
11.95k stars 4.14k forks source link

Coq still misinterprets Verilog as Coq #5041

Closed secworks closed 3 years ago

secworks commented 3 years ago

Coq sometimes misinterpret Verilog HW description language files as being Coq prover language files. There has been a number of issues opened related to this. But all of them has been closed without a fix. https://github.com/github/linguist/issues?q=is%3Aissue+verilog+coq+is%3Aclosed

And the issue still persists and is (probably increasing) due to the trend towards open hardware design. As an example of files being misinterpreted here is a repo: https://github.com/fusesoc/tiny-cores

These files in the repo above are clearly Verilog, but considered by Linguist to be Coq files: shift_reg/piso.v shift_reg/sipo.v

Expected language:

Verilog https://en.wikipedia.org/wiki/Verilog

Detected language:

Coq https://en.wikipedia.org/wiki/Coq

lildude commented 3 years ago

As mentioned in at https://github.com/github/linguist/issues/4750#issuecomment-567083122: this will be because the heuristic isn't precise enough and would need tweaking by someone that knows the languages involved.

https://github.com/github/linguist/pull/4751 was an attempt at improving things, but it went stale /cc @0x00a1e9

If you've got the bandwidth and knowledge of either/both of these languages, we'd be happy for a new PR or for a request to re-open and finish that other PR.

secworks commented 3 years ago

I do know Verilog. But someone else must provide knowledge for Coq. That said, it should be really easy if linguist can parse words.

Verilog files will normally contain a "module"-"endmodule" pair.

Browsing the Coq language, these words are not part of the language. On the other hand, Coq contains a "::=" operator that would be a syntax error in Verilog. The Coq operators "<+" and "<-" would also be syntax errors in Verilog, SystemVerilog.

Coq language ref I used: https://coq.inria.fr/refman/language/core/basic.html

SystemVerilog LRM: http://www.ece.uah.edu/~gaede/cpe526/SystemVerilog_3.1a.pdf

Verilog-2001 HDL Cheat sheet: https://sutherland-hdl.com/pdfs/verilog_2001_ref_guide.pdf

lf- commented 3 years ago

Here's another misidentified file: https://github.com/cliffordwolf/picorv32/blob/409d0dfd6772551e2ce77502e368973c447cbeb8/picorv32.v

lf- commented 3 years ago

4751 looks like it went stale because of a lack of explanation of why the sample given looked so different. In brief, this is because the new sample is (program-like, cannot be synthesized into hardware) testbench code that uses tasks as opposed to the existing samples which are mostly if not entirely synthesizable Verilog. These subsets look very different because of the typical sequential execution model of testbench code as compared to synthesizable hardware description that has everything happen at the same time.

If this is the only issue with #4751, I'd support merging it in its current state as it looks like it appropriately removes the imprecise Coq comment heuristic that's almost certainly causing this problem.

Alhadis commented 3 years ago

@lf- You definitely sound like you know your stuff; so you're saying (* this *) is a valid comment in Verilog as well? (Or, well, any legal Verilog syntax, for that matter?)

lf- commented 3 years ago

@Alhadis yes, (* ... *) is valid Verilog as a synthesis attribute but not as a comment. Similar in function to [[attributes]] in C++ in that it's a directive to the compiler related to the next item.

I believe that this syntax was introduced to supersede the previous and rather cursed practice of putting synthesis attributes in comments.

Alhadis commented 3 years ago

In that case, the changes proposed by #4751 were most definitely valid. Unfortunately, the submitter deleted their fork, so I can't reopen the PR.

@lf- If you want to submit a new PR reintroducing those changes, rest assured you have my blessing. 👍 I feel more confident letting you take the reins here, as you're clearly the expert.

lf- commented 3 years ago

I have 🦇👻reanimated👻🦇 and rebased #4751 onto master in #5075 as requested :)

Alhadis commented 3 years ago

Thanks! 👍 Approved for merge (though only a GitHub staff member can do that, so we'll have to wait until @lildude gets online)