Closed zoffixznet closed 6 years ago
As far as I can see, the heuristic is failing because the heuristic looks for
use strict
, but the module usesuse strictures
instead.
Could we add use strictures
to the Perl 5 heuristic rule then?
We should also be able to use the shebang in the .t
file. I'll look into it.
If someone submits a PR, is there a way to somehow preview that the changes give right results? Is installing Linguist locally sufficient?
Yes, it's sufficient to test detection.
Could we add use strictures to the Perl 5 heuristic rule then?
This is a losing battle - here's the most comprehensive list I know of for modules that import strict and warnings for you, and this isn't including opinionated ones that only use part of strict: https://metacpan.org/source/PETDANCE/Perl-Critic-1.130/lib/Perl/Critic/Utils/Constants.pm#L84-126
We should also be able to use the shebang in the .t file. I'll look into it.
There is no requirement for perl test files to contain a shebang, since they are normally executed via a makefile or prove. Note #!perl as in that file isn't even a valid shebang, it's just a convention.
This is a losing battle - here's the most comprehensive list I know of for modules that import strict and warnings for you, and this isn't including opinionated ones that only use part of strict: https://metacpan.org/source/PETDANCE/Perl-Critic-1.130/lib/Perl/Critic/Utils/Constants.pm#L84-126
Heuristic rules were never meant to catch 100% cases (I guess it's in the name...). I'm only asking here if adding use strictures
would help us catch a greater number of Perl 5 files, or is it only used in rare cases? (i.e., is it worth it?)
There is no requirement for perl test files to contain a shebang, since they are normally executed via a makefile or prove.
The Shebang strategy is also a heuristic, so same as above.
Note #!perl as in that file isn't even a valid shebang, it's just a convention.
Sure, but it's a convention we should recognize.
I'm only asking here if adding use strictures would help us catch a greater number of Perl 5 files
It would, but for .pm
/.pl
files, it's overwhelmingly likely for them to be Perl 5 and not Perl 6 code, which is why I suggested trying to look for Perl 6 code first, and if Perl 6 things aren't present to assume Perl 5. This way, even if the Perl 5 code using one of those dozens modules that provide strict, it'd still be detected correctly. And if Perl 6 code is mis-detected as Perl 5, the author always has the option to simply switch to the more common .pm6
/.p6
extensions.
As far as I can see (with GitHub.com's search engine), we still have quite a few .pl
and .pm
Perl 6 files. Of course, these search use the use v6
statement, so we can use our current heuristic rule to detect those files. But the point is that many still use .pl
and .pm
, and I doubt all of them have use v6
at the top. In any case, we try to avoid these kinds of fallback else
in heuristic rules.
For now, I think we can address this by both improving the heuristic rule for Perl 5 and fixing the issue with the shebang. If you have other cases of misclassifications I can look into them, but it's unlikely will be able to detect them all accurately (that's what Linguist overrides are for). If/When the number of .pl
and .pm
Perl 6 files gets low enough, we'll change the heuristic rule or even remove those extensions for Perl 6.
I would still ask why it should mark a file as Perl 6 when it can't disambiguate between them. Perl 5 usage is far more widespread and as noted will be the only use of these extensions going forward, so it's a more sensible default.
I would still ask why it should mark a file as Perl 6 when it can't disambiguate between them.
Because the Heuristic strategy is not the last strategy. It's actually the naive Bayesian classifier that runs last and misclassifies your files here.
Preliminary Steps
Please confirm you have...
Problem Description
Perl 5 and Perl 6 are different languages (despite similar names; like C vs C++). However, the heuristic fails in some cases. This was previously opened in https://github.com/github/linguist/issues/3906 and the cases covered in that issue were resolved.
URL of the affected repository:
https://github.com/zoffixznet/Music-ScaleNote
As far as I can see, the heuristic is failing because the heuristic looks for
use strict
, but the module usesuse strictures
instead.Last modified on:
No idea
Expected language:
Perl 5 (aka "Perl")
Detected language:
Perl 6
Details
Since the previous times we had ambiguity issues, the Perl 6 language has now decided more strongly that its extensions are
.pm6
for modules,.p6
for files and.t
for scripts.Thus, I'd argue that the heuristic for
.pm
and.pl
files should prefer Perl 5 much more strongly. Likely just checking for Perl 6 first, withuse\s+v6\b
only, not the current regex, and marking code as Perl 6 if detected, otherwise, going to Perl 5. (personally, I'd say it shouldn't be disambiguated with Perl 6 at all and just assume those extensions are always Perl 5 or other lang).That leaves
.t
disambiguation. I'm less certain of this one, but just thinking that it can be improved if Perl 6 is tested first and checked againstuse\s+v6\b
, along with some testing subs that have a dash in their namesdone-testing|is-deeply|isa-ok|throws-like
. If that doesn't match, then assume the code's Perl 5 or other lang.If someone submits a PR, is there a way to somehow preview that the changes give right results? Is installing Linguist locally sufficient?