apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

Reduce Size of Pinot Release Binary #13726

Open ankitsultana opened 1 month ago

ankitsultana commented 1 month ago

For 1.2.0 release, the sizes are as follows:

➜ pinot-dev-dist ls -lth apache-pinot-1.2.0-rc0
total 2164032
-rw-r--r--@ 1 ankitsultana staff 833B Jul 30 20:22 apache-pinot-1.2.0-src.tar.gz.asc
-rw-r--r--@ 1 ankitsultana staff 935M Jul 30 20:22 apache-pinot-1.2.0-bin.tar.gz
-rw-r--r--@ 1 ankitsultana staff 833B Jul 30 20:22 apache-pinot-1.2.0-bin.tar.gz.asc
-rw-r--r--@ 1 ankitsultana staff 128B Jul 30 20:22 apache-pinot-1.2.0-src.tar.gz.sha512
-rw-r--r--@ 1 ankitsultana staff 128B Jul 30 20:22 apache-pinot-1.2.0-bin.tar.gz.sha512
-rw-r--r--@ 1 ankitsultana staff 121M Jul 30 20:22 apache-pinot-1.2.0-src.tar.gz

We started hitting limits when running SVN commit for the release, and created this ticket to bump the limit: https://issues.apache.org/jira/browse/INFRA-26009

We are getting an exception this time, but the recommendation we have gotten back is that the binary size has gotten too big and we should aim for smaller binaries. That makes sense to me.

Over the last year I have also seen that mvn builds have started taking progressively more time.

Creating this ticket so we can track this workstream. Specifically, I think we should target the following:

  1. Reducing size of the Pinot binaries
  2. Speeding up mvn builds

cc: @xiangfu0 @Jackie-Jiang

xiangfu0 commented 1 month ago

+1 on this.

This is majorly the packaging issue. In terms of reducing plugin size, just dump some thoughts here:

  1. Separate the main pinot distribution with a few selective plugins;
  2. Package plugins into separated jars/artifacts for picking up;
  3. Provide scripts for downloading plugins before or at runtime;
  4. Docker image could stay as it is right now;
  5. Above changes should add no overhead(be transparent experience) from pinot users. And it might be small overhead for pinot devs, e.g. plugin development etc.
hpvd commented 1 month ago

double +1 just some general thoughts on security:

  1. the bigger a package the higher the chance for security issues insights -now or some weeks after release. so
    • smaller package is always better
    • there may be more motivation to fix CVEs, when there is a chance to cut the number easily e.g. by half with fixing only 2 of them
    • we may think in future of decoupling release processes of plugins/artifacts to easily have a new release which fixes some things instead of having to wait for a complete one
  2. we should think of security when providing the possibility/scripts for downloading plugins before or especially at runtime
    • it may be a good idea to look at / use already available tools/processes for this
hpvd commented 2 weeks ago

same direction I like this work of removing stuff: https://github.com/apache/pinot/commit/6d64650e7c210456a890ee6f9a6eaf05a7ab557b