XAMPPRocky / tokei

Count your code, quickly.
Other
10.88k stars 527 forks source link

Verilog source files are misidentified as Coq sources #520

Open ravenexp opened 4 years ago

ravenexp commented 4 years ago

languages.json lists .vg as a Verilog source file extension, while everyone has been using .v for Verilog sources for decades. I have never seen a *.vg Verilog source file in my life.

NickHackman commented 4 years ago

The issue with just changing Verilog to *.v is then it conflicts with Coq.

Scc handles this conflict by more intelligently guessing the filetype by looking for keywords in the first 20,000 lines of code. This is something that could be implemented in Tokei.

Downsides

If @XAMPPRocky decides this is worth doing then I'll be happy to implement a similar solution to Scc :smile:

XAMPPRocky commented 4 years ago

@NickHackman Thank you for your interest. At this point I don't want to add heuristics that are based on the source code for the downsides you mentioned, as well as I don't think that added complexity would add much. If you're interested in a solution, I had a design to resolve this that allows users to override the extensions as part of .tokeirc. I didn't release it because the toml library had a limitation where it wouldn't parse the languages map into HashMap<LanguageType, LanguageConfig>, but that might not be the case anymore.

columns = 80
treat_doc_strings_as_comments = true

[languages.Verilog]
extensions = ["v"]
NickHackman commented 4 years ago

@XAMPPRocky sadly that doesn't work for directories that contain both Verilog and Coq files. I have no idea who has that sort of file structure or really what a Coq file is, but still.

In the future it would be nice if Tokei gained some of the features that scc has over it.

XAMPPRocky commented 4 years ago

@NickHackman It's true that it doesn't cover that case, but I would consider that quite pathological, a project that is using the same file extension for two different languages in the same source directory is not something I've ever seen, and I would need some pretty heavy convincing that it would actually be useful to someone. This concern could also be partially if not fully addressed by allowing .tokeirc's to work recursively but that's also a lot of work.

lf- commented 3 years ago

Also watch out for introducing this bug linguist got with this particular language pair: https://github.com/github/linguist/issues/5041

Verilog has synthesis attributes in (* ... *), which will get misidentified as Coq comments. It looks like the current comment detector used by tokei may hit this.