apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.49k stars 1.37k forks source link

Parquet libraries do not follow proper semantic versioning #2170

Open asfimport opened 6 years ago

asfimport commented 6 years ago

There are changes between 1.8.0 and 1.10.0 that break API compatibility. A minor version change is supposed to be backward compatible with 1.9.0 and 1.8.0.

Reporter: Vlad Rozov / @vrozov

Related issues:

Note: This issue was originally created as PARQUET-1295. Please see the migration documentation for further details.

asfimport commented 6 years ago

Gabor Szadovszky / @gszadovszky: I completely agree, @vrozov. Could you please explain the incompatibilities you have found? In the build process we are using semver to enforce the API compatibility with 1.7.0. So, theoretically there should not be such changes. Of course, this plugin only can do syntactical checks on the API, the semantical compatibility is a bit more tough.

asfimport commented 6 years ago

Vlad Rozov / @vrozov: Just as an example, there are semver incompatible changes to org.apache.parquet.column.values.ValuesReader.

asfimport commented 6 years ago

Vlad Rozov / @vrozov: My guess is that semantic versioning is enforced for the API that existed in 1.7.0, but it is not enforced for any new API introduced in 1.8.x or 1.9.x.

asfimport commented 6 years ago

Gabor Szadovszky / @gszadovszky: I think, you are right. We should check against the previously released versions (e.g. changes between 1.8.2 and 1.8.3). I don't know if it is intentional to keep checking the API against 1.7.0 or only that we forgot to update the version.

Another thing is org.apache.parquet.column.** is excluded from the check suggesting that these classes (including ValuesReader) are not part of the public API.

What do you think, @rdblue?

asfimport commented 6 years ago

Vlad Rozov / @vrozov: I don't see anything other than the exclusion of org.apache.parquet.column.* from semantic versioning that suggests that it is not public. It must be part of semantic versioning, IMO.

asfimport commented 6 years ago

Gabor Szadovszky / @gszadovszky: Here starts a lot of exclusions. That suggests there are many packages meant to be internal even if they are public from java point of view. Unfortunately, we did not use any other comment/annotation/package naming to inform the user which class/method shall be used and which not.

To maintain backward compatibility from now on the best we can do is to remove all the exclusions (except the ones that really make sense e.g. shaded packages) and use the previous release all the time (1.10.0 for now) for the semver check. Unfortunately, we cannot do anything with the breaking changes already released.

We also have to take care about the backward compatibility at review. Not only public API changes can break backward compatibility but the actual behaviour of the functions.

@zivanfi, @rdblue, what do you think?

asfimport commented 6 years ago

Ryan Blue / @rdblue: Since there is not a well-defined public API, I understand how it is annoying to find out that some classes are internal. But the APIs referenced here are definitely something that we've always considered internal.

We use 1.7.0 for semver checks because that's the oldest release that we want public API compatibility with (even though "public API" is not well defined). We only add exclusions for private classes when they change, so growing this list is essentially marking APIs private as we make changes. I think that's worth keeping rather than needing to add to the list when we make changes to a private class each release.

asfimport commented 6 years ago

Vlad Rozov / @vrozov: Parquet libraries do not follow proper semantic versioning as any outside developer can't reliably say whether she is using API that is stable and is subject of the semantic versioning or it is an internal API and an upgrade to a newer version may require a significant code changes. Take a look at org.apache.parquet.column.values.ValuesReader. Nothing in the package name, class or method annotations suggests that it is an internal class that is not subject of semantic versioning. That information is hidden somewhere in the pom file. The class even has java documentation, that implies that it is not "internal".

The same applies to classes and methods added since 1.7.0. How may an external developer know that a new method added after 1.7.0 to a class that existed before 1.7.0 is not subject of the semantic versioning and avoid using it?

asfimport commented 6 years ago

Zoltan Ivanfi / @zivanfi: I agree with @vrozov, in fact I used the same argument on the mailing list:

Parquet uses semantic versioning. As a library, it should take extra care not to break its public API in minor releases. This also applies to publicly accessible classes and methods that are considered internal if this "internalness" is not properly documented. It is tempting to dismiss these cases with the reasoning that they were not intended to be public in the first place, but from an API consumer's point of view, this "leaked" API is indistinguishable from the "real" API. Currently the information of what is public and what is internal is undocumented and only known to a few Parquet developers.

Until API consumers have a way to determine the intended target audience of our classes and methods, we should pay more attention to keeping our leaked internal API backwards-compatible as well.

@vrozov, regarding your comment:

That information is hidden somewhere in the pom file.

It's actually even worse than that. The exclusions are only added to the pom file when a breaking change is made, so even that list is unsuitable for determining whether something is considered internal, as it only contains those parts of the internal API that we already broke.