:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
This PR improves many areas of copyright detection for correctness and false positive.
In particular:
The way we strip text from markup tags has been entirely reworked, replacing an obscure and untested regex with a simpler splitter regex and a set of known tags to strip.
Handling of various tag-like strings such as <https://some.url.com> and emails has been modified and improved
A large number of issues and false positive have been fixed, in particular by re-scanning Linux.
Many new tests have been added along the way
The code to select candidate lines and prepare text lines has been streamlined and simplified
This PR improves many areas of copyright detection for correctness and false positive. In particular:
<https://some.url.com>
and emails has been modified and improvedTasks