Closed AyanSinhaMahapatra closed 3 years ago
Another issue nexB/scancode-toolkit#2374 could also be picked up by the analyzer if #29 is implemented, to pick up false positives based on line number (say > 1000) and rule length (< 3 here, but have to find out a more suitable threshold). Also a good test case.
Here in nexB/scancode-toolkit#2371, there's an instance of a false-positive where
"rule_length": 2
.This doesn't get detected as a false-positive because currently the steps are:- To separate probable false-positives was, "is_license_tag" == true and "rule_length" == 1 as here, and then run it through a classifier to determine that more accurately.
We definitely need to -
license_tag
rules, and see which ones have the potential to be matched to become afalse_positive
and then either increase these"rule_length" criteria
for these cases to be correctly analyzed too or even maintain aset
ofrules which can generate potential false positives
.From comment