apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 913 forks source link

[Subtask][DISCUSS] Docker image tag options #4175

Closed ulysses-you closed 1 year ago

ulysses-you commented 1 year ago

Code of Conduct

Search before asking

Describe the subtask

Kyuubi supports multi-engines, e.g. Spark, Flink, Hive... So there is a chance to provide some custom image tags since we are using a separate repo.

An idea of tag options: version tag derived tags
1.6.0 Kyuubi only 1.6.0-scala_2.12-java8-ubuntu 1.6.0, latest
1.6.0 with Spark default version binary tar 1.6.0-spark3.3-scala_2.12-java8-ubuntu 1.6.0-spark3.3, 1.6.0-spark
1.6.0 with Flink default version binary tar 1.6.0-flink1.5-scala_2.12-java8-ubuntu 1.6.0-flink3.3, 1.6.0-flink
1.6.0 with all default version binary tar 1.6.0-all-scala_2.12-java8-ubuntu 1.6.0-all

Parent issue

https://github.com/apache/kyuubi/issues/4158

Are you willing to submit PR?

ulysses-you commented 1 year ago

This issue is used to disscuss what image tag should be pushed during each release. cc @yaooqinn @pan3793 @zwangsheng @hddong @yanghua @turboFei @cfmcgrady @cxzl25 and all. Any thought is welcome !

zwangsheng commented 1 year ago

+1 for provided basic, spark, flink and all images.

For tag, maybe we should omit scala_2.12-java8-ubuntu as not provided another options.

We can indicate those versions in the description, so that users can perceive.

ulysses-you commented 1 year ago

yea, scala_2.12-java8-ubuntu is more like a style of DOI. For apache/kyuubi repo, we can use the derived tag

bowenliang123 commented 1 year ago

As for the choice of operation system, I would prefer options in either alpine or centos. Considering the maturity in industry production and package management,

bowenliang123 commented 1 year ago

And as Kyuubi contains extensions for engine, consider release an all-in-one tag version shipping the possible available extensions and configs for best practice or proficiency as well. e.g. including,

bowenliang123 commented 1 year ago

As for the choice of operation system type of base images, I would prefer alpine or centos. From a prospective of production maturity or package management

bowenliang123 commented 1 year ago

Also, to maximize both performance, accessibility, and potentiality in best practices and potentiality out of box, consider release an fine-tuned all-in-one image for each engine with engine/server plugins and suggested configs of by default. eg.

turboFei commented 1 year ago

As for the choice of operation system, I would prefer options in either alpine or centos.

Ubuntu looks fine for me, all the hadoop images in ebay are based on ubuntu or migrating to ubuntu.

-1 for centos.

turboFei commented 1 year ago

spark official docker image is also using ubuntu, https://github.com/apache/spark-docker/tree/master/3.3.1

bowenliang123 commented 1 year ago

centos is just an option for discussion as many companies already have local proxy for packages , I am good with both ubuntu and alpine. But considering in 2 facts,

I am quite sure the community will agree to ship a version based on ubuntu, but it is never the first choice for production for me and other users with similar background or consideration. And that's why I raise the discussion for a second option of OS base lib here.

turboFei commented 1 year ago

never heard of using ubuntu in production either for server deployment or container images in the infrastructure industry (e.g. finance, power grid and etc.).

our old hadoop nodes images were based on centos before, but now, all the hadoop nodes are based on ubuntu or in the migration.

I think If I did not say that, you will not know that we are using ubuntu.

There might be some limitations for my view, because I just know the current status in ebay.

bowenliang123 commented 1 year ago

Yes, thanks for sharing the status in your company. With all the discussions, ubuntu is a balanced choice for the base image. Even for the image size, it is not as extremely small as alpine but already smaller than centos. ubuntu is still the first option for this topic, since it is adopted in Spark's official image (as you just pointed out), but also in Flink's (https://github.com/apache/flink-docker). Meanwhile, it's less attractive for the situation in my case if this's the only option for base os, and it's pushing me away from practice and having more thoughts in reassembling the docker images in private distribution.

ulysses-you commented 1 year ago

@bowenliang123 can you show some projects who ships centOS to image ? That may help us to see the difference.

bowenliang123 commented 1 year ago

centos is no longer supported and we could skip this option for kyuubi. And my concern is for enterprise-level production use,