aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 548 forks source link

Text "under CDDL/LGPL dual license" not recognized by license scan #3358

Open DennisClark opened 1 year ago

DennisClark commented 1 year ago

In a recent scan of tika-parsers-1.28.5-sources.jar the text "under CDDL/LGPL dual license" was not detected as a choice of cddl-1.0 OR lgpl-2.1-plus but instead returned unknown-license-reference based on a match with just the "dual license" part as detected using unknown-license-reference_318.RULE. The file in question is at /org/apache/tika/parser/code/SourceCodeParser.java

The code context is: /**

I checked the JHighlight project on GitHub and the COPYING file states: It is distributed under the terms of either:

If possible, it would be nice if the license detection process would resolve this situation to cddl-1.0 OR lgpl-2.1-plus .

I am of the opinion (arguable of course) that just CDDL can be interpreted as cddl-1.0 and just LGPL can be interpreted as lgpl-2.1-plus, since those are very commonly found in such cases.

DennisClark commented 1 year ago

Original code is at https://repo1.maven.org/maven2/org/apache/tika/tika-parsers/1.28.5/tika-parsers-1.28.5-sources.jar