clearlydefined / crawler

A service that crawls projects and packages for information relevant to ClearlyDefined
MIT License
45 stars 31 forks source link

"Discovered" licenses from notices file showing up in "Declared" field - Google Mavens #583

Open ariel11 opened 1 week ago

ariel11 commented 1 week ago

There is a bug with Google Maven packages (maybe other package types too?) where a notices file license data (which should be "discovered" licenses) are erroneously included in the "declared" field.

Example: https://clearlydefined.io/definitions/maven/mavengoogle/com.google.android.gms/play-services-location/21.2.0.

All the info from the third party notices file is being erroneously included in the "declared" field - those should be "discovered" licenses. In this case, the "declared" would be "OTHER" since there's not a SPDX ID for the "Android Software Development Kit License."

image
ariel11 commented 1 week ago

@capfei - I thought we had an open Issue about this already but I couldn't find it?

@elrayle - FYI, this is a significant pain point I am hoping to get on your radar. Happy to chat more.

qtomlinson commented 1 week ago

@ariel11 @elrayle @capfei Yes, there was a previous similar issue on this. Similar to my comment on that issue: in the ScanCode v30 result, there is no package level license information, so license information for top level files is used to derive the declared license. is_license_text is true for third_party_licenses.txt in ScanCode v30 result, and therefore the licenses matched are used as the declared license. We can rerun this case after ScanCode upgrage PR is merged.