crate / docker-crate

Source repository for the official CrateDB Docker image
https://registry.hub.docker.com/_/crate/
Apache License 2.0
49 stars 22 forks source link

Switch back to almalinux:9 as base image #222

Closed amotl closed 1 year ago

amotl commented 1 year ago

We discovered regressions when running the AlmaLinux minimal image on Kubernetes. As a safety measure to not cause any similar problems to users, we are going back to the AlmaLinux standard image, effectively reverting GH-221.

When looking at the size savings, we also found that the compressed image size delta compared between standard and minimal is only 27 MB.

/cc @SStorm, @WalBeh, @proddata, @seut

mfussenegger commented 1 year ago

Given that the issue was that cloud depended on some CLI tools, not on CrateDB being broken in some way, I'd tend to keep the change to use the minimal image.

But if we do, we should probably add something about it to the release notes.

proddata commented 1 year ago

Given that the issue was that cloud depended on some CLI tools

Technically it is the operator https://github.com/crate/crate-operator/blob/3008d1a15b7211144e7b33100e81ab7e886aca83/crate/operator/create.py#L409 which is also used otherwise.

Also this is only for the node naming part, but the dns resolution for discovery via the service seems also to be broken:

[2023-04-18T09:55:00,373][ERROR][i.c.d.SrvUnicastHostsProvider] [data-hot-] DNS lookup exception:
java.util.concurrent.ExecutionException: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-3f02d8a6-f712-4710-acd1-cf3c9a459208.251c42fa-12ba-4ce7-a17f-d8cb8ee8b795.svc.cluster.local.' [SRV(33)]

which makes me think, that any new kubernetes based deployment, including the manual sts creation or the helm chart is currently also broken. (that would need to be verified!)

SStorm commented 1 year ago

discovery via the service seems also to be broken

It's a red herring - a timing issue due to it being a headless service in Kubernetes. The DNS resolution works eventually.

amotl commented 1 year ago

At first, I also was more aligned to keeping the smaller image, and fixing the environment, as @mfussenegger suggested. It was at the time when I thought only missing userspace tools have been involved.

But after more eventual flaws have been discovered, I am also leaning more towards using the standard image again, as @SStorm and @WalBeh suggested, and because the net size savings would not be that huge.

In order to conclude the decision, I was still waiting for a final verdict by @mfussenegger and @hammerhead. I will be happy to add a corresponding item to the release notes, as @mfussenegger suggested.

amotl commented 1 year ago

to add a corresponding item to the release notes

I was just looking at the release notes content, and did not find any reference to AlmaLinux, or about switching Docker/OCI images from CentOS. So, I am asking if we should mention the adjustments there at all?

If we want to have it, here is a patch:

hammerhead commented 1 year ago

I'm fine with switching back to revert to a state known to work. On the other side, it would be really great to document any known package dependencies explicitly. When switching earlier from CentOS to AlmaLinux we seem to have been lucky that there weren't significant differences in installed packages apparently.

mfussenegger commented 1 year ago

But after more eventual flaws have been discovered

What flaws?

amotl commented 1 year ago

@mfussenegger asked:

What flaws?

Above, @proddata and @SStorm reported:

but the dns resolution for discovery via the service seems also to be broken a timing issue due to it being a headless service in Kubernetes. The DNS resolution works eventually.

I think @WalBeh has been researching a bit more details, but I don't know if there has been an outcome on this specific detail, other than confirming that the flaw is present. There is an internal report/discussion at 5.3.0 does not successfully start on k8s, whose gist is that CrateDB never recovers from a DNS resolution loop while zone entries are converging within the K8s SDN, which it did before 5.3.0.

Details

This is the DNS Lookup from 5.3.0

[2023-04-18T09:55:00,373][ERROR][i.c.d.SrvUnicastHostsProvider] [data-hot-] DNS lookup exception:
java.util.concurrent.ExecutionException: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-3f02d8a6-f712-4710-acd1-cf3c9a459208.251c42fa-12ba-4ce7-a17f-d8cb8ee8b795.svc.cluster.local.' [SRV(33)]

this sub-sequently fails. Does anyone have an idea about [SRV(33)]?

On this detail, should I create a dedicated issue at crate/crate, as @seut already suggested the other day?

seut commented 1 year ago

On this detail, should I create a dedicated issue at crate/crate, as @seut already suggested the other day?

I'm not sure if we're on the same page here ;-) The problem with the DNS resolution doesn't sounds to me like a CrateDB issue but rather a kubernetes one. So this should be fixed elsewhere and not treat as "yeah it fails sometimes but eventually works" ?

Do I missunderstand something here? Anyway we may move the DNS discussion to a dedicated issue.

SStorm commented 1 year ago

This DNS "issue" (quirk?) is not related to CrateDB or the almalinux base image, is internal to our setup in CrateDB Cloud, and we can stop making a big deal out of it ;-) It was erroneously attributed to being the problem in the beginning (Probably by me - sorry!).

amotl commented 1 year ago

Thank you very much for the clarification, @SStorm!

SStorm commented 1 year ago

In case anyone's interested, the DNS "issue" happens because the discovery service is headless in k8s. Headless means that there is no load balancer or the magic iptables stuff that Kubernetes does for regular ClusterIP services. Instead, the service DNS name resolves to the IPs of the pods:

➜  ~ k get pods -o wide
NAME                                                              READY   STATUS      RESTARTS   AGE    IP             NODE                            NOMINATED NODE   READINESS GATES
backup-metrics-705a7012-3f89-441d-a10e-b3749d05e993-55fbbcprbbg   1/1     Running     0          2d4h   10.128.0.215   aks-np002-13778198-vmss00002u   <none>           <none>
crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-0             2/2     Running     0          2d4h   10.128.1.111   aks-np002-13778198-vmss00002v   <none>           <none>
crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-1             2/2     Running     0          2d4h   10.128.0.136   aks-np002-13778198-vmss00002z   <none>           <none>
crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-2             2/2     Running     0          2d4h   10.128.0.219   aks-np002-13778198-vmss00002u   <none>           <none>
[root@crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-2 /]# dig +short _cluster._tcp.crate-discovery-705a7012-3f89-441d-a10e-b3749d05e993.06410cd8-1e4c-462f-92ca-9a0a5864b6b0.svc.cluster.local
10.128.0.136
10.128.1.111
10.128.0.219

And Kubernetes does not resolve that hostname until the pod(s) is/are actually up. So you have that error pop up for a few times when the first pod is starting, and then it goes away. Mystery solved :)

mfussenegger commented 1 year ago

Is it possible to handle the removal of rev in the operator?

My POV:

Some background: I think mid to long-term we could change the JDK bundling to use jlink for a more minimal JRE, and then we could also switch the base image to a minimal glibc image. That would probably allow us to get to about ~100 MB in total

SStorm commented 1 year ago

Is it possible to handle the removal of rev in the operator?

Yes. But I strongly reject the idea to do it on an emergency basis like now.

If there was a 5.3.0-slim we could make the switch at some point in the future without causing disruption.

amotl commented 1 year ago

Hi again,

We can decide that the image size is not a problem and revert the base image change, and do nothing more about it. (But then why was it a problem a couple weeks ago?)

It was intended to reduce the wire transfer size of the image, and we thought we have been looking at savings of around 100 MB. However, it turned out that we have been looking at the uncompressed sizes. After compression, the net savings are only around 30 MB, and I think that does not satisfy any hassle at all.

We could also introduce -slim variants that only contain CrateDB. Some background: I think mid to long-term we could change the JDK bundling to use jlink for a more minimal JRE, and then we could also switch the base image to a minimal glibc image. That would probably allow us to get to about ~100 MB in total.

This is excellent. I am looking forward! And, yes, I also agree that this should probably be converged into corresponding -slim images.

In either case, we should probably document what is provided by the image.

I've added an item with https://github.com/crate/crate/pull/14027 about the switchover, but we can also introduce a change log on this repository. If you see room for better documentation in this regard, about "what's inside", I will be happy to receive guidance.

With kind regards, Andreas.