IBM / license-scanner

License Scanner
Apache License 2.0
6 stars 3 forks source link

Always sort and dedup the matches list #12

Closed markstur closed 2 years ago

markstur commented 2 years ago

The "matches list" is just begin/end positions matched by a license. We sort in a couple places, but it will be more usable in general if it is just sorted once and kept sorted.

There is also no real advantage to keeping duplicates here. Multiple identical matches at the same position for a license is not really a desirable output. After we sort we can eliminate the dups.

The inconsistent sort was also causing random test fails...


Original test fail that created this: Inconsistent test results at identifier_test.go:518

The following test appears to fail due to unexpected output sort. I.e., it passes locally, and on PR, but failed on push to master when that +/- line changed positions.

identifier_test.go:518: Didn't get expected result: (-want, +got): map[string][]identifier.Match{ "MIT": {

pritidesai commented 2 years ago

Until there is a better solution, an alternative is to sort the identifier results before comparing:

https://pkg.go.dev/github.com/google/go-cmp/cmp/cmpopts#SortSlices

markstur commented 2 years ago

Yep. I looked at that, but then I see we really want to sort and dedup those match{begin/end} lists. It's done in a couple places already, but I can fix it in one place and it should just always be nicer.