eclipse-dash / dash-licenses

Extract license information from content.
http://projects.eclipse.org/projects/technology.dash
Eclipse Public License 2.0
47 stars 33 forks source link

Many duplicate IPLab issues lead to lots of output annoying/unnecessary changes in license check output #350

Open dhendriks opened 2 months ago

dhendriks commented 2 months ago

We're getting in ESCET all kinds of changes in IPLab issue numbers in the check output. See https://gitlab.eclipse.org/eclipse/escet/escet/-/issues/870. Here is an example:

-maven/mavencentral/org.junit.jupiter/junit-jupiter-params/5.10.2, EPL-2.0, approved, #9708
+maven/mavencentral/org.junit.jupiter/junit-jupiter-params/5.10.2, EPL-2.0, approved, #15250

Somehow https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/9708 and https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/15250 are for the same artifact. I'm not sure why. Did they not report via the tool, but manually? If so, could we educate them not to make duplicate entries?

In any regard, I think the duplicates should be closed as duplicates, rather than judged again. And the duplicates should then not be used by the license check tool. That way, the output of the check doesn't change unnecessarily, requiring us to update the DEPENDENCIES file in our repo again and again, without good reason.

dhendriks commented 2 months ago

Note that even more annoying is the following:

-maven/mavencentral/org.tukaani/xz/1.9, LicenseRef-Public-Domain, approved, CQ23498
+maven/mavencentral/org.tukaani/xz/1.9, None, restricted, #15225

As the new one IPLab issue is not even approved yet.

See also the discussion at https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/15225.

waynebeaton commented 2 months ago

You are correct that these new issues should not have been created. I'll investigate that.

In the meantime, I've turned on a feature that will identify the duplicates and ignore them.

waynebeaton commented 2 months ago

For the behaviour that we're seeing, the license-check API call must either be returning a bogus result, or we're getting some sort of connection error while calling that API and are not handling the exception correctly.

pstuecker commented 2 months ago

Finally back in the office, so I've looked at the bogus issues created by the set bot user and how they were triggered.

The tickets created by the set bot user can be traced back to this comment https://github.com/eclipse-set/set/pull/672#issuecomment-2167483309 in a dependabot PR. The relevant Github Action log is here:

https://github.com/eclipse-set/set/actions/runs/9512789698/job/26221525620#step:8:675

[...]
[INFO] Querying Eclipse Foundation for license data for 500 items.
[INFO] Found 306 items.
[INFO] Querying Eclipse Foundation for license data for 500 items.
[INFO] Found 376 items.
[INFO] Querying Eclipse Foundation for license data for 500 items.
[INFO] Found 483 items.
[INFO] Querying Eclipse Foundation for license data for 500 items.
[INFO] Found 191 items.
[INFO] Querying Eclipse Foundation for license data for 500 items.
[INFO] Found 0 items.
[INFO] Querying Eclipse Foundation for license data for 478 items.
[INFO] Found 0 items.
[INFO] Querying ClearlyDefined for license data for 212 items.
[INFO] Found 212 items.
[INFO] Querying ClearlyDefined for license data for 500 items.
[INFO] Found 439 items.
[INFO] Querying ClearlyDefined for license data for 282 items.
[INFO] Found 270 items.
[INFO] License information could not be automatically verified for the following content:
[... long list of stuff follows that we know is approved ...]

Dash then attempted to request reviews for a lot of items, but luckily stopped after 100.

Related Dash code is: https://github.com/eclipse/dash-licenses/blob/1.1.0/core/src/main/java/org/eclipse/dash/licenses/foundation/EclipseFoundationSupport.java#L59

So for some reason:

Why that happened is not clear to me though. It's also no longer reproducable.

waynebeaton commented 2 months ago

Thanks for this.

Several bogus issues were opened within a short time span. I'm fairly certain that it is the result of mishandling a failure. The failure may be in the API. We're still investigating.

waynebeaton commented 2 months ago

I've tried everything that I can think of with the Dash License Tool itself and the API that it calls. In all of the failure cases that I've managed to test, the tool has consistently just failed. We could, perhaps, fail more gracefully, but it's failed. I'm confident that the Dash License Tool itself is behaving correctly. So now, I'm looking at the data.

I reviewed the output of a build before and a build after the one that you highlighted. One thing that I noted was that the problematic build has more than 500 additional dependencies. I don't quite see how this could have resulted in the duplication.

I did notice that in both the before and after builds, every call to the "Eclipse Foundation" returned at least some information. In the problematic build, the last two calls to the API found nothing.

[INFO] Querying Eclipse Foundation for license data for 478 items. [INFO] Found 0 items.

When trying to find license information for a dependency, the API considers two versions that differ only by service release (per Semantic Versioning) to be equivalent. The version that was approved by IPLab 9708 was version 5.10.0; the bogus issue was for version 5.10.2. The API should have identified the existing 5.10.0 version when asked about the 5.10.2 version, so this isn't the cause the problem.

Note that the Dash License Tool will open an IPLab issue for a dependency if there is not already an open IPLab issue for that dependency. In this case, the original issue was closed. So this is, at least, expected behaviour. That is... the tool should not have flagged version 5.10.2, but when it did, it was entirely consistent that it opened an issue.

My best guess at this point is that there was some kind of database issue and that -- for some period of time -- some of the data was absent or inaccessible. Every query of 500 items to "Eclipse Foundation" is discrete, so it's possible that something changed between calls. I'll continue my investigation there.