erdos / stencil

templating engine for DOCX and PPTX files
https://stencil.erdos.dev
Eclipse Public License 2.0
111 stars 12 forks source link

Performance regression with graal-based image #54

Open gmile opened 3 years ago

gmile commented 3 years ago

Describe the bug When using graal docker image introduced in https://github.com/erdos/stencil/pull/53, .docx templates are rendered slower.

To Reproduce

  1. download & extract a .docx template file into a /tmp/templates directory: lipsum_sample.docx.zip

  2. Run two different containers, side by side (in separate Terminal windows):

    docker run --rm --name stencil-0.3.10       --publish 5000:8080 --volume /tmp/templates:/templates ghcr.io/erdos/stencil:0.3.10
    docker run --rm --name stencil-0.3.10-graal --publish 5001:8080 --volume /tmp/templates:/templates ghcr.io/erdos/stencil:0.3.10-graal
  3. Run the tests:

    • for regular container:

      $ bash -c "for test in 1 2 3 4 5; do time curl -s localhost:5000/lipsum_sample.docx --header 'content-type: application/json' --data '{}' > /tmp/lipsum_sample_regular.docx; done"
      real    0m10.385s
      user    0m0.006s
      sys     0m0.010s
      
      real    0m4.191s
      user    0m0.005s
      sys     0m0.008s
      
      real    0m4.245s
      user    0m0.005s
      sys     0m0.008s
      
      real    0m4.239s
      user    0m0.005s
      sys     0m0.008s
      
      real    0m4.263s
      user    0m0.004s
      sys     0m0.008s
      $
    • for graal container:

      $ bash -c "for test in 1 2 3 4 5; do time curl -s localhost:5001/lipsum_sample.docx --header 'content-type: application/json' --data '{}' > /tmp/lipsum_sample_graal.docx; done"
      
      real    0m19.201s
      user    0m0.005s
      sys     0m0.008s
      
      real    0m12.841s
      user    0m0.004s
      sys     0m0.010s
      
      real    0m13.111s
      user    0m0.004s
      sys     0m0.007s
      
      real    0m12.936s
      user    0m0.004s
      sys     0m0.008s
      
      real    0m13.117s
      user    0m0.004s
      sys     0m0.007s
      $
  4. Notice how graal-based image is consistently slower.

  5. Stop the containers:

    docker kill stencil-0.3.10
    docker kill stencil-0.3.10-graal

Expected behavior Performance of application in a graal-based image is on par with that of a regular one, or faster.

However, I know very little about GraalVM, as well as the goals to build a graalvm-based image. Perhaps, performance was not among the goals for this image.

Screenshots No screenshots.

Documents I've attached a link to the document above. It is a 800+ pages .docx file with no template placeholders in it.

Office versions: Not applicable as far as I can see.

Environment where template is rendered:

Additional context No additional context.

erdos commented 1 month ago

Hello @gmile , thank you for opening the PR 4 years ago :-)

The GraalVM image was never really production-ready, but rather a field for experimentation and a test case to make sure the library can be used in graal services. Therefore, the performance was not really optimized for that. Also, I just realized that the graal container broke in #140 2 years ago and I just patched it in the last commit :-(

However, now I had the chance to pick this issue up and spend some time fixing the service. The current measurements with your approach are the following with th 0.6.1-SNAPSHOT version:

With JVM service:

real    0m1.276s
user    0m0.004s
sys 0m0.007s

real    0m0.008s
user    0m0.002s
sys 0m0.003s

real    0m0.007s
user    0m0.002s
sys 0m0.003s

real    0m0.007s
user    0m0.003s
sys 0m0.001s

real    0m0.006s
user    0m0.001s
sys 0m0.002s

With GraalVM service:

real    0m3.163s
user    0m0.004s
sys 0m0.008s

real    0m0.169s
user    0m0.002s
sys 0m0.005s

real    0m0.150s
user    0m0.003s
sys 0m0.001s

real    0m0.169s
user    0m0.002s
sys 0m0.003s

real    0m0.152s
user    0m0.002s
sys 0m0.004s

So it seems like the issue still persists (0.007s vs 0.150s), even though the performance is much better today than it used to be 4 years ago. In the next days, I will check some options:

erdos commented 1 month ago

With latest GraalVM EE:

real    0m0.159s
user    0m0.005s
sys 0m0.006s

real    0m0.161s
user    0m0.003s
sys 0m0.005s

real    0m0.139s
user    0m0.004s
sys 0m0.002s

real    0m0.134s
user    0m0.002s
sys 0m0.005s

real    0m0.132s
user    0m0.002s
sys 0m0.003s

No significant improvement so far.

erdos commented 1 month ago

I tested PGO with some profiling today and managed to get the runtime down to cca 0.090s in the native containers:

real    0m0.092s
user    0m0.001s
sys 0m0.003s

real    0m0.084s
user    0m0.001s
sys 0m0.003s

real    0m0.080s
user    0m0.001s
sys 0m0.004s

real    0m0.084s
user    0m0.003s
sys 0m0.003s