Closed ulysses-you closed 1 year ago
This issue is used to disscuss what image tag should be pushed during each release. cc @yaooqinn @pan3793 @zwangsheng @hddong @yanghua @turboFei @cfmcgrady @cxzl25 and all. Any thought is welcome !
+1 for provided basic
, spark
, flink
and all
images.
For tag, maybe we should omit scala_2.12-java8-ubuntu
as not provided another options.
We can indicate those versions in the description, so that users can perceive.
yea, scala_2.12-java8-ubuntu
is more like a style of DOI. For apache/kyuubi
repo, we can use the derived tag
As for the choice of operation system, I would prefer options in either alpine
or centos
.
Considering the maturity in industry production and package management,
ubuntu
is less used as base docker image in productioncentos
is more common in production in practise , but it's no longer maintained from officialalpine
is popular and practically used as small-size Linux base images in Docker, generally used from app to service deployment. plus, the apk
is easy to use and includes most common packageAnd as Kyuubi contains extensions for engine, consider release an all-in-one
tag version shipping the possible available extensions and configs for best practice or proficiency as well.
e.g. including,
As for the choice of operation system type of base images, I would prefer alpine
or centos
.
From a prospective of production maturity or package management
ubuntu
is less used in server or containers deployment in production, for Ops means different tool set and packagescentos
is more popular in the pass , but good enough for production . lacking of further official maintainance.alpine
is one of most popular base images with minimized image size while providing comprehensive features. OpenJDK also ships official version with it in new versions although alpine uses glibc. apk
in alpine
is easy to use and covering most common packages. Also, to maximize both performance, accessibility, and potentiality in best practices and potentiality out of box, consider release an fine-tuned all-in-one
image for each engine with engine/server plugins and suggested configs of by default.
eg.
[engine connector] Spark Hive Connector (https://kyuubi.readthedocs.io/en/master/connector/spark/hive.html?highlight=hive%20connector#hive-connector-integration)
[engine plugin] Auxiliary Optimization, enabled by default (https://kyuubi.readthedocs.io/en/master/extensions/engines/spark/rules.html)
[engine plugin] Authz Plugin for Ranger, disabled by default
[engine plugin] SQL Lineage Support, disabled by default
[engine plugin] Hive Dialect Support, disabled by default (https://kyuubi.readthedocs.io/en/master/extensions/engines/spark/jdbc-dialect.html) ...
As for the choice of operation system, I would prefer options in either
alpine
orcentos
.
Ubuntu looks fine for me, all the hadoop images in ebay are based on ubuntu or migrating to ubuntu.
-1 for centos.
spark official docker image is also using ubuntu, https://github.com/apache/spark-docker/tree/master/3.3.1
centos
is just an option for discussion as many companies already have local proxy for packages , I am good with both ubuntu
and alpine
.
But considering in 2 facts,
ubuntu
is good for software builds with development tools, but does not contribute too in runtime. For container deployment , size matters. The compressed size of distribution will be up to 1G (and 1.5G decompressed), as 400MB OS lib with ubuntu + 250MB JDK + 250 MB Kyuubi with extensions. As with alpine
, the whole size can be shrunk down to 600-700MB in total and 1G in decompressed. ubuntu
in production either for server deployment or container images in the infrastructure industry (e.g. finance, power grid and etc.). I am quite sure the community will agree to ship a version based on ubuntu, but it is never the first choice for production for me and other users with similar background or consideration. And that's why I raise the discussion for a second option of OS base lib here.
never heard of using ubuntu in production either for server deployment or container images in the infrastructure industry (e.g. finance, power grid and etc.).
our old hadoop nodes images were based on centos before, but now, all the hadoop nodes are based on ubuntu or in the migration.
I think If I did not say that, you will not know that we are using ubuntu.
There might be some limitations for my view, because I just know the current status in ebay.
Yes, thanks for sharing the status in your company. With all the discussions, ubuntu
is a balanced choice for the base image. Even for the image size, it is not as extremely small as alpine but already smaller than centos.
ubuntu
is still the first option for this topic, since it is adopted in Spark's official image (as you just pointed out), but also in Flink's (https://github.com/apache/flink-docker).
Meanwhile, it's less attractive for the situation in my case if this's the only option for base os, and it's pushing me away from practice and having more thoughts in reassembling the docker images in private distribution.
@bowenliang123 can you show some projects who ships centOS to image ? That may help us to see the difference.
centos
is no longer supported and we could skip this option for kyuubi.
And my concern is for enterprise-level production use,
yum
like package management and rpm
installers
Code of Conduct
Search before asking
Describe the subtask
Kyuubi supports multi-engines, e.g. Spark, Flink, Hive... So there is a chance to provide some custom image tags since we are using a separate repo.
Parent issue
https://github.com/apache/kyuubi/issues/4158
Are you willing to submit PR?