Kotlin / binary-compatibility-validator

Public API management tool
Apache License 2.0
761 stars 55 forks source link

Allow to filter out packages by regex #166

Open martinbonnin opened 6 months ago

martinbonnin commented 6 months ago

I'd like to exclude all symbols that are in *.internal packages. Currently it requires listing them all explicitly, would be nice to be able to specify ignoredPackages as a list of regexes:

apiValidation {
    /**
     * Exclude every package that ends with `.internal`
     */
    ignoredPackagesReegexes += "\.*\.internal"
}
fzhinkin commented 5 months ago

It seems like regular expressions could be too heavyweight (conceptually, not in terms of performance overhead) for this particular problem. Would wildcards (like * and ?, or even ant-style globs with **) solve the problem or are there some practical cases where it's not enough to specify the *?

martinbonnin commented 5 months ago

Would wildcards

Strong opposition to wildcards. Regex has documentation and years of battle tested usage. Most developers should be familiar with Regexes.

Wilcards have different implementations (shell, ant, other?) and it's impossible to tell what to expect without reading through (sometimes absent) documentation.

Plus regexes are more flexible. Sorry for the upfront response but I've been burned too many times 😅

fzhinkin commented 5 months ago

The flexibility regexes have comes with complexity and it's easy to make a mistake. When applied to package names, it may be too cumbersome to escape every dot in the name to get a semantically correct RE.

Wilcards have different implementations (shell, ant, other?) and it's impossible to tell what to expect without reading through (sometimes absent) documentation.

One still has to read documentation to figure out if the expression like ignoredPackages += "\.*\.internal" will match the whole input or only a subsequence.

With wildcards having only * in the grammar, it should be possible to express basic package-related patterns without keeping complex grammar rules in mind. I believe that we are all used to treating * as the Kleene operator, so there should not be that much ambiguity.

From the API point of view, wildcards allow us to continue using the same ignoredPackages property to specify both literal package names and wildcards (let's pretend that there are no packages with * in their names 😨). For regexes, the new API has to be introduced, otherwise, users have to escape all their existing ignoredPackages to preserve the semantics after update.

martinbonnin commented 5 months ago

One question is: "does * match . ?"

If I want to ignore com.foo.internal and all subpackages, do I have to ignore com.foo.internal and com.foo.internal.* (and potentially com.foo.internal.*.*?). Can I just ignore com.foo.internal* (but then it might be surprizing to someone that adds com.foo.internalization?).

I've been there with dexguard and to this day I won't be able to tell you if a dexguard rule matches a class name from memory. I wish all string matching used regexes so that I don't have to fit new rules in my brain.

Now the BCV use case is probably simpler and you're 100% right about new API. Using just ignoredPackages wouldn't work. I have updated the initial comment to use ignoredPackagesRegexes. So at the end of the day, it's a tradeoff that also depends on personal preferences. Both work for my immediate use case so I'll be happy no matter what :). Thanks for looking into this!

fzhinkin commented 5 months ago

Related PR: https://github.com/Kotlin/binary-compatibility-validator/pull/121