Closed jrudolph closed 1 year ago
That would make sense
@jrudolph do you think this is best done now or should we wait till after the v1.0.0 release?
There is some discussion about bringing this into v1.0.0 release in https://github.com/apache/incubator-pekko-http/issues/39.
It would simplify the license files in pekko-http if we just used the parboiled2 jar instead of embedding the source in our code base.
I agree, an explicit dependency is much better than taking a copy. This makes it much easier to take upgrades of the dependency (and to notice the need for them), which can be an issue for security.
If there's no functional difference by using upstream, I don't think switching to the dependency is real violation of the v1.0 intent.
As long as Pekko 1.0.0 exhibits the same behavior as Akka 2.6.x feature for feature, bug for bug after the rename refactor in consuming code.
1.1.0
)(+) As @spangaer mentioned, we can reason about bugs (if there are any) 1.0.0
does not do anything apart from different packaging. No need to worry about performance impacts.
(+) No need to run heavy performance benchmarks before releasing
(-) As @pjfanning mentioned, licence files management is difficult (I really don't know how painful this point)
1.0.0
)(+) Licence file management would be easy
(-) IMHO: Performance bench marks needs to be run before releasing
(-) Assume underlying dependency parboiled
has (functional
or performance
) bug which resulted lot of side effects in the system narrow down the issue would be bit more complicated than the Approach 1
I would prefer Approach 1
unless there is a huge impact on licensing part (I don't have knowledge in the licensing area). IMHO: With this approach we are separating the concerns
For clarity, I favor approach 2 (do it in 1.0.0). I do believe some validation should be made, but best effort only, it shouldn't absorb all energy either, otherwise there's nothing to be gained.
The main reasoning should be about compatibility:
All in all, I'd say moving now is the best choice because we are breaking compatibility anyway and the risk of breaking something in a major way is relatively small.
The main reasoning should be about compatibility:
- staying with the vendor copy allows users to use a parboiled2 version independently of pekko-http (currently not a huge concern)
- some users of akka-http might have accidentally relied on the akka-parsing artifact and might be confused if the classes are now missing (not a concern as the parsing infrastructure is meant to be internal, and also package names are not compatible with akka anyway)
- related to our original reasoning to vendor, in the future, we need to consider whether a parboiled2 dependency will hold us up with adopting new Scala versions because we might have to always wait for a compatible parboiled2 version before being able to move to a new Scala 3 version (IMO, we can expect some problems with compatibility in the future but it stays questionable if we could fix them more easily in our vendor copy than upstream)
All in all, I'd say moving now is the best choice because we are breaking compatibility anyway and the risk of breaking something in a major way is relatively small.
I also agree with this sentiment. Regarding the point about being held back on upstream if they are slow to adopt features (such as newer Scala release), even in the worst case scenario its possible to fork that library (with all of its original licenses) to do such changes and use the fork until it gets merged to upstream.
Approach 1: As Is approach (convert to dependency in
1.1.0
)(+) As @spangaer mentioned, we can reason about bugs (if there are any)
1.0.0
does not do anything apart from different packaging. No need to worry about performance impacts. (+) No need to run heavy performance benchmarks before releasing (-) As @pjfanning mentioned, licence files management is difficult (I really don't know how painful this point)Approach 2: Convert to dependency right now (in
1.0.0
)(+) Licence file management would be easy (-) IMHO: Performance bench marks needs to be run before releasing (-) Assume underlying dependency
parboiled
has (functional
orperformance
) bug which resulted lot of side effects in the system narrow down the issue would be bit more complicated than theApproach 1
I would prefer
Approach 1
unless there is a huge impact on licensing part (I don't have knowledge in the licensing area). IMHO: With this approach we are separating the concerns
@Seetaramayya there is momentum behind Approach 2. Would you have time to run the benchmarks in http-bench-jmh
with and without this change? I think we can hold off for a few days regardless. I may have time in a few days to try it myself.
I see a ~ 5-10% slowdown in ServerProcessingBenchmark
. There are a few optimization commits that we are losing now that we should try to upstream:
To me it makes sense to leave the PR for now and try to upstream the necessary changes as soon as we can to see if upstream can land those optimizations + release them.
If this is done before a release is made then we have the best of both worlds, otherwise we might have to deal with the pain/annoyance of license changes (not sure if the community would accept such a slowdown given that one of pekko-http's main selling points is for it to be fast).
FTR:
Run with jmh:run -r3 -i3 -wi 3 -w 3 ServerProcessingBenchmark
Before:
[info] Result "org.apache.pekko.http.impl.engine.ServerProcessingBenchmark.benchRequestProcessing":
[info] 327842.154 ±(99.9%) 3747.035 ops/s [Average]
[info] (min, avg, max) = (324466.534, 327842.154, 331264.288), stdev = 2229.801
[info] CI (99.9%): [324095.118, 331589.189] (assumes normal distribution)
After:
[info] Result "org.apache.pekko.http.impl.engine.ServerProcessingBenchmark.benchRequestProcessing":
[info] 309453.217 ±(99.9%) 4596.546 ops/s [Average]
[info] (min, avg, max) = (304411.018, 309453.217, 312560.405), stdev = 2735.331
[info] CI (99.9%): [304856.671, 314049.763] (assumes normal distribution)
If this is done before a release is made then we have the best of both worlds, otherwise we might have to deal with the pain/annoyance of license changes (not sure if the community would accept such a slowdown given that one of pekko-http's main selling points is for it to be fast).
I would be fine with merging this now and try to get the improvements back later. Since the benchmark is missing processing (no actual sockets in the benchmark), the actual slowdown is less than shown here. I have some more ideas how to improve parsing which would change the story again, so maybe less important to keep single-digit optimizations right now if we can simplify the build and get a release out sooner.
Regarding merging now, I personally don't feel that strongly on this. My general point is that I don't think there is a rush to merge this PR (not expecting any merge conflicts in the later) and unlike pekko core, a release of pekko-http is further down the line, i.e. I am also not seeing us needing to do the necessary license changes for this repo anytime soon (mainly because we have much bigger fish to fry).
PR is approved on my end, so if people feel that its overall wiser to merge it now then good with me
Approach 1: As Is approach (convert to dependency in
1.1.0
)(+) As @spangaer mentioned, we can reason about bugs (if there are any)
1.0.0
does not do anything apart from different packaging. No need to worry about performance impacts. (+) No need to run heavy performance benchmarks before releasing (-) As @pjfanning mentioned, licence files management is difficult (I really don't know how painful this point)Approach 2: Convert to dependency right now (in
1.0.0
)(+) Licence file management would be easy (-) IMHO: Performance bench marks needs to be run before releasing (-) Assume underlying dependency
parboiled
has (functional
orperformance
) bug which resulted lot of side effects in the system narrow down the issue would be bit more complicated than theApproach 1
I would preferApproach 1
unless there is a huge impact on licensing part (I don't have knowledge in the licensing area). IMHO: With this approach we are separating the concerns@Seetaramayya there is momentum behind Approach 2. Would you have time to run the benchmarks in
http-bench-jmh
with and without this change? I think we can hold off for a few days regardless. I may have time in a few days to try it myself.
I would love to do that, but I joined the conversation bit late. @jrudolph provided the results
Probably true that there's no hurry yet, but I do tend to agree that the performance penalty is acceptable for a first version. Should it come to that.
relates to #35
Done in #14
Akka HTTP right now contains a copy of parboiled2 in its sources (with some slight additions). This decision was originally made on the grounds that Akka should not have many dependencies on Scala libraries, so it can be released quickly after a new (incompatible) Scala version has been released.
By now, parboiled2 does not depend on shapeless any more and the ecosystem has more stabilized (and after all, maintaining the fork has not helped at all with migration to new Scala versions...). So, I'd suggest we move to upstream parboiled2.