apache / pekko-http

The Streaming-first HTTP server/module of Apache Pekko
https://pekko.apache.org/
Apache License 2.0
153 stars 39 forks source link

Move to upstream parboiled2 dependency #1

Closed jrudolph closed 1 year ago

jrudolph commented 1 year ago

Akka HTTP right now contains a copy of parboiled2 in its sources (with some slight additions). This decision was originally made on the grounds that Akka should not have many dependencies on Scala libraries, so it can be released quickly after a new (incompatible) Scala version has been released.

By now, parboiled2 does not depend on shapeless any more and the ecosystem has more stabilized (and after all, maintaining the fork has not helped at all with migration to new Scala versions...). So, I'd suggest we move to upstream parboiled2.

nvollmar commented 1 year ago

That would make sense

pjfanning commented 1 year ago

@jrudolph do you think this is best done now or should we wait till after the v1.0.0 release?

pjfanning commented 1 year ago

There is some discussion about bringing this into v1.0.0 release in https://github.com/apache/incubator-pekko-http/issues/39.

It would simplify the license files in pekko-http if we just used the parboiled2 jar instead of embedding the source in our code base.

kw217 commented 1 year ago

I agree, an explicit dependency is much better than taking a copy. This makes it much easier to take upgrades of the dependency (and to notice the need for them), which can be an issue for security.

spangaer commented 1 year ago

If there's no functional difference by using upstream, I don't think switching to the dependency is real violation of the v1.0 intent.

As long as Pekko 1.0.0 exhibits the same behavior as Akka 2.6.x feature for feature, bug for bug after the rename refactor in consuming code.

Seetaramayya commented 1 year ago

Approach 1: As Is approach (convert to dependency in 1.1.0)

(+) As @spangaer mentioned, we can reason about bugs (if there are any) 1.0.0 does not do anything apart from different packaging. No need to worry about performance impacts. (+) No need to run heavy performance benchmarks before releasing (-) As @pjfanning mentioned, licence files management is difficult (I really don't know how painful this point)

Approach 2: Convert to dependency right now (in 1.0.0)

(+) Licence file management would be easy (-) IMHO: Performance bench marks needs to be run before releasing (-) Assume underlying dependency parboiled has (functional or performance ) bug which resulted lot of side effects in the system narrow down the issue would be bit more complicated than the Approach 1

I would prefer Approach 1 unless there is a huge impact on licensing part (I don't have knowledge in the licensing area). IMHO: With this approach we are separating the concerns

spangaer commented 1 year ago

For clarity, I favor approach 2 (do it in 1.0.0). I do believe some validation should be made, but best effort only, it shouldn't absorb all energy either, otherwise there's nothing to be gained.

jrudolph commented 1 year ago

The main reasoning should be about compatibility:

All in all, I'd say moving now is the best choice because we are breaking compatibility anyway and the risk of breaking something in a major way is relatively small.

mdedetrich commented 1 year ago

The main reasoning should be about compatibility:

  • staying with the vendor copy allows users to use a parboiled2 version independently of pekko-http (currently not a huge concern)
  • some users of akka-http might have accidentally relied on the akka-parsing artifact and might be confused if the classes are now missing (not a concern as the parsing infrastructure is meant to be internal, and also package names are not compatible with akka anyway)
  • related to our original reasoning to vendor, in the future, we need to consider whether a parboiled2 dependency will hold us up with adopting new Scala versions because we might have to always wait for a compatible parboiled2 version before being able to move to a new Scala 3 version (IMO, we can expect some problems with compatibility in the future but it stays questionable if we could fix them more easily in our vendor copy than upstream)

All in all, I'd say moving now is the best choice because we are breaking compatibility anyway and the risk of breaking something in a major way is relatively small.

I also agree with this sentiment. Regarding the point about being held back on upstream if they are slow to adopt features (such as newer Scala release), even in the worst case scenario its possible to fork that library (with all of its original licenses) to do such changes and use the fork until it gets merged to upstream.

pjfanning commented 1 year ago

Approach 1: As Is approach (convert to dependency in 1.1.0)

(+) As @spangaer mentioned, we can reason about bugs (if there are any) 1.0.0 does not do anything apart from different packaging. No need to worry about performance impacts. (+) No need to run heavy performance benchmarks before releasing (-) As @pjfanning mentioned, licence files management is difficult (I really don't know how painful this point)

Approach 2: Convert to dependency right now (in 1.0.0)

(+) Licence file management would be easy (-) IMHO: Performance bench marks needs to be run before releasing (-) Assume underlying dependency parboiled has (functional or performance ) bug which resulted lot of side effects in the system narrow down the issue would be bit more complicated than the Approach 1

I would prefer Approach 1 unless there is a huge impact on licensing part (I don't have knowledge in the licensing area). IMHO: With this approach we are separating the concerns

@Seetaramayya there is momentum behind Approach 2. Would you have time to run the benchmarks in http-bench-jmh with and without this change? I think we can hold off for a few days regardless. I may have time in a few days to try it myself.

jrudolph commented 1 year ago

I see a ~ 5-10% slowdown in ServerProcessingBenchmark. There are a few optimization commits that we are losing now that we should try to upstream:

mdedetrich commented 1 year ago

To me it makes sense to leave the PR for now and try to upstream the necessary changes as soon as we can to see if upstream can land those optimizations + release them.

If this is done before a release is made then we have the best of both worlds, otherwise we might have to deal with the pain/annoyance of license changes (not sure if the community would accept such a slowdown given that one of pekko-http's main selling points is for it to be fast).

jrudolph commented 1 year ago

FTR:

Run with jmh:run -r3 -i3 -wi 3 -w 3 ServerProcessingBenchmark

Before:

[info] Result "org.apache.pekko.http.impl.engine.ServerProcessingBenchmark.benchRequestProcessing":
[info]   327842.154 ±(99.9%) 3747.035 ops/s [Average]
[info]   (min, avg, max) = (324466.534, 327842.154, 331264.288), stdev = 2229.801
[info]   CI (99.9%): [324095.118, 331589.189] (assumes normal distribution)

After:

[info] Result "org.apache.pekko.http.impl.engine.ServerProcessingBenchmark.benchRequestProcessing":
[info]   309453.217 ±(99.9%) 4596.546 ops/s [Average]
[info]   (min, avg, max) = (304411.018, 309453.217, 312560.405), stdev = 2735.331
[info]   CI (99.9%): [304856.671, 314049.763] (assumes normal distribution)
jrudolph commented 1 year ago

If this is done before a release is made then we have the best of both worlds, otherwise we might have to deal with the pain/annoyance of license changes (not sure if the community would accept such a slowdown given that one of pekko-http's main selling points is for it to be fast).

I would be fine with merging this now and try to get the improvements back later. Since the benchmark is missing processing (no actual sockets in the benchmark), the actual slowdown is less than shown here. I have some more ideas how to improve parsing which would change the story again, so maybe less important to keep single-digit optimizations right now if we can simplify the build and get a release out sooner.

mdedetrich commented 1 year ago

Regarding merging now, I personally don't feel that strongly on this. My general point is that I don't think there is a rush to merge this PR (not expecting any merge conflicts in the later) and unlike pekko core, a release of pekko-http is further down the line, i.e. I am also not seeing us needing to do the necessary license changes for this repo anytime soon (mainly because we have much bigger fish to fry).

PR is approved on my end, so if people feel that its overall wiser to merge it now then good with me

Seetaramayya commented 1 year ago

Approach 1: As Is approach (convert to dependency in 1.1.0)

(+) As @spangaer mentioned, we can reason about bugs (if there are any) 1.0.0 does not do anything apart from different packaging. No need to worry about performance impacts. (+) No need to run heavy performance benchmarks before releasing (-) As @pjfanning mentioned, licence files management is difficult (I really don't know how painful this point)

Approach 2: Convert to dependency right now (in 1.0.0)

(+) Licence file management would be easy (-) IMHO: Performance bench marks needs to be run before releasing (-) Assume underlying dependency parboiled has (functional or performance ) bug which resulted lot of side effects in the system narrow down the issue would be bit more complicated than the Approach 1 I would prefer Approach 1 unless there is a huge impact on licensing part (I don't have knowledge in the licensing area). IMHO: With this approach we are separating the concerns

@Seetaramayya there is momentum behind Approach 2. Would you have time to run the benchmarks in http-bench-jmh with and without this change? I think we can hold off for a few days regardless. I may have time in a few days to try it myself.

I would love to do that, but I joined the conversation bit late. @jrudolph provided the results

spangaer commented 1 year ago

Probably true that there's no hurry yet, but I do tend to agree that the performance penalty is acceptable for a first version. Should it come to that.

pjfanning commented 1 year ago

relates to #35

jrudolph commented 1 year ago

Done in #14