dart-lang / pub-dev

The pub.dev website
https://pub.dev
BSD 3-Clause "New" or "Revised" License
774 stars 145 forks source link

How query for packages which have license A AND/OR license B #6176

Open dkbast opened 1 year ago

dkbast commented 1 year ago

Unfortunately the search help page on pub.dev does not go into detail on boolean search expressions: https://pub.dev/help/search

The goal here is to search for packages filtered by a list of licenses which should be either allowed or forbidden.

e.g. license:mit,gpl-3.0 should list all packages with either MIT or GPL-3.0 license.

I was looking into implementing the filter functionality discussed here: https://github.com/dart-lang/pub-dev/issues/4373

In this related issue: https://github.com/dart-lang/pub-dev/issues/5490 the option of adding search flags is discussed, but it seems like the license tag can only search for ONE license and not for multiple licenses (any of a given list).

license:<spdx-identifier>

I tried the following:

license:mit license:gpl-3.0 https://pub.dev/packages?q=license%3Amit+license%3Agpl-3.0 -> 2 Results with MIT && GPL-3.0

license:mit,gpl-3.0 https://pub.dev/packages?q=license%3Amit%2Cgpl-3.0 -> 7 Results all MIT, I would expect all packages with either MIT or GPL-3.0

Negating works:

license:mit -license:gpl-3.0 https://pub.dev/packages?q=license%3Amit+-license%3Agpl-3.0 -> 14874 (which is the list of all mit licensed packages minus those with two licenses)

license:mit https://pub.dev/packages?q=license%3Amit -> 14876

jonasfj commented 1 year ago

There is no boolean parameters in general. Just negation.

But maybe could support license:mit,gpl-3.0 and interpret it as (license:mit OR license:gpl-3.0) -- We probably won't support OR operator in general or parenthesis. But comma separated list of licenses could be a good start.

I see no commas used in existing SPDX identifiers, but there is also no promises what characters may be used in the future. https://spdx.org/licenses/ But I think it's reasonable.

dkbast commented 1 year ago

@jonasfj could you point me to where that would need to be implemented, then I could work on a proposal

jonasfj commented 1 year ago

Pretty sure it's in: https://github.com/dart-lang/pub-dev/tree/master/app/lib/search

I think @isoos is the domain expert on this stuff.

dkbast commented 1 year ago

I'm wondering if the license is just added as a tag in here: https://github.com/dart-lang/pub-dev/blob/master/app/lib/search/search_service.dart#L75

I think we also need to make sure comma separated tags are not being split in this function: https://github.com/dart-lang/pub-dev/blob/master/app/lib/search/text_utils.dart#L69

And the matching is done using pana - : https://github.com/dart-lang/pana/blob/master/lib/src/license_detection/license.dart

Looks like the extracted licenses are added here: https://github.com/dart-lang/pub-dev/blob/f820b76142c7b91bd2abf6bb24bea7f35db94aeb/app/lib/search/backend.dart#L100

this could make the "licensa a OR b" a lot more difficult since I believe the license part would need to be factored out.

jonasfj commented 1 year ago

hmm, discussing this offline brings up an interesting point:

And I'm not sure (B) is useful, nor that it's a thing we should want. Actually, I intuitively think that is:null-safe,plugin means is:null-safe AND is:plugin.


Perhaps the best path forward here is to properly support operators (OR / AND). Then change the UI, such that checking two checkboxes in the "licenses" filter causes (license:mit OR license:gpl-3) to be appended to the search query. It's not as short and elegant, but it's very explicit and more future proof.

If we want to change the search engine at some point, this is something most other engines could reasonably support -- probably even without too much rewriting.

isoos commented 1 year ago

At the moment the best we can offer to achieve license:a OR license:b is the removing all the other licenses, e.g. -license:c -license:d -license:e. Another solution would be to tag a group of licenses that are in one "family", similarly to license:osi-approved. However, we don't want to argue the items in such groups, and we'd need a reputable organization's grouping to get it adopted.

I think that license:a,b,c could be a workaround for now, but it would make our query parsing more specialized and makes it harder to migrate to anything else.

Ideally, we should support a query format that is generic enough that can be switched between search backends. Something along the lines of package:query could work here too, but so far we haven't adopted it into pub.dev's search. Adapting this query parsing would be worth doing in the long term, but making the query evaluation will take a considerable amount of work (usually the OR queries are at fault here).