Closed amotl closed 1 year ago
Given that the issue was that cloud depended on some CLI tools, not on CrateDB being broken in some way, I'd tend to keep the change to use the minimal image.
But if we do, we should probably add something about it to the release notes.
Given that the issue was that cloud depended on some CLI tools
Technically it is the operator https://github.com/crate/crate-operator/blob/3008d1a15b7211144e7b33100e81ab7e886aca83/crate/operator/create.py#L409 which is also used otherwise.
Also this is only for the node naming part, but the dns resolution for discovery via the service seems also to be broken:
[2023-04-18T09:55:00,373][ERROR][i.c.d.SrvUnicastHostsProvider] [data-hot-] DNS lookup exception:
java.util.concurrent.ExecutionException: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-3f02d8a6-f712-4710-acd1-cf3c9a459208.251c42fa-12ba-4ce7-a17f-d8cb8ee8b795.svc.cluster.local.' [SRV(33)]
which makes me think, that any new kubernetes based deployment, including the manual sts creation or the helm chart is currently also broken. (that would need to be verified!)
discovery via the service seems also to be broken
It's a red herring - a timing issue due to it being a headless service in Kubernetes. The DNS resolution works eventually.
At first, I also was more aligned to keeping the smaller image, and fixing the environment, as @mfussenegger suggested. It was at the time when I thought only missing userspace tools have been involved.
But after more eventual flaws have been discovered, I am also leaning more towards using the standard image again, as @SStorm and @WalBeh suggested, and because the net size savings would not be that huge.
In order to conclude the decision, I was still waiting for a final verdict by @mfussenegger and @hammerhead. I will be happy to add a corresponding item to the release notes, as @mfussenegger suggested.
to add a corresponding item to the release notes
I was just looking at the release notes content, and did not find any reference to AlmaLinux, or about switching Docker/OCI images from CentOS. So, I am asking if we should mention the adjustments there at all?
If we want to have it, here is a patch:
I'm fine with switching back to revert to a state known to work. On the other side, it would be really great to document any known package dependencies explicitly. When switching earlier from CentOS to AlmaLinux we seem to have been lucky that there weren't significant differences in installed packages apparently.
But after more eventual flaws have been discovered
What flaws?
@mfussenegger asked:
What flaws?
Above, @proddata and @SStorm reported:
but the dns resolution for discovery via the service seems also to be broken a timing issue due to it being a headless service in Kubernetes. The DNS resolution works eventually.
I think @WalBeh has been researching a bit more details, but I don't know if there has been an outcome on this specific detail, other than confirming that the flaw is present. There is an internal report/discussion at 5.3.0 does not successfully start on k8s, whose gist is that CrateDB never recovers from a DNS resolution loop while zone entries are converging within the K8s SDN, which it did before 5.3.0.
Details
This is the DNS Lookup from
5.3.0
[2023-04-18T09:55:00,373][ERROR][i.c.d.SrvUnicastHostsProvider] [data-hot-] DNS lookup exception: java.util.concurrent.ExecutionException: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-3f02d8a6-f712-4710-acd1-cf3c9a459208.251c42fa-12ba-4ce7-a17f-d8cb8ee8b795.svc.cluster.local.' [SRV(33)]
this sub-sequently fails. Does anyone have an idea about
[SRV(33)]
?
On this detail, should I create a dedicated issue at crate/crate, as @seut already suggested the other day?
On this detail, should I create a dedicated issue at crate/crate, as @seut already suggested the other day?
I'm not sure if we're on the same page here ;-) The problem with the DNS resolution doesn't sounds to me like a CrateDB issue but rather a kubernetes one. So this should be fixed elsewhere and not treat as "yeah it fails sometimes but eventually works" ?
Do I missunderstand something here? Anyway we may move the DNS discussion to a dedicated issue.
This DNS "issue" (quirk?) is not related to CrateDB or the almalinux base image, is internal to our setup in CrateDB Cloud, and we can stop making a big deal out of it ;-) It was erroneously attributed to being the problem in the beginning (Probably by me - sorry!).
Thank you very much for the clarification, @SStorm!
In case anyone's interested, the DNS "issue" happens because the discovery service is headless in k8s. Headless means that there is no load balancer or the magic iptables stuff that Kubernetes does for regular ClusterIP services. Instead, the service DNS name resolves to the IPs of the pods:
➜ ~ k get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
backup-metrics-705a7012-3f89-441d-a10e-b3749d05e993-55fbbcprbbg 1/1 Running 0 2d4h 10.128.0.215 aks-np002-13778198-vmss00002u <none> <none>
crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-0 2/2 Running 0 2d4h 10.128.1.111 aks-np002-13778198-vmss00002v <none> <none>
crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-1 2/2 Running 0 2d4h 10.128.0.136 aks-np002-13778198-vmss00002z <none> <none>
crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-2 2/2 Running 0 2d4h 10.128.0.219 aks-np002-13778198-vmss00002u <none> <none>
[root@crate-data-hot-705a7012-3f89-441d-a10e-b3749d05e993-2 /]# dig +short _cluster._tcp.crate-discovery-705a7012-3f89-441d-a10e-b3749d05e993.06410cd8-1e4c-462f-92ca-9a0a5864b6b0.svc.cluster.local
10.128.0.136
10.128.1.111
10.128.0.219
And Kubernetes does not resolve that hostname until the pod(s) is/are actually up. So you have that error pop up for a few times when the first pod is starting, and then it goes away. Mystery solved :)
Is it possible to handle the removal of rev
in the operator?
My POV:
We can decide that the image size is not a problem and revert the base image change, and do nothing more about it. (But then why was it a problem a couple weeks ago?)
We decide that the image size is a problem. Then we need to remove stuff. For that we need to identify what's needed. Ideally we'd only have CrateDB itself, but afaik we have crash
for bootstrapping reasons too.
We could also introduce -slim
variants that only contain CrateDB
In either case, we should probably document what is provided by the image.
Some background: I think mid to long-term we could change the JDK bundling to use jlink for a more minimal JRE, and then we could also switch the base image to a minimal glibc image. That would probably allow us to get to about ~100 MB in total
Is it possible to handle the removal of
rev
in the operator?
Yes. But I strongly reject the idea to do it on an emergency basis like now.
If there was a 5.3.0-slim
we could make the switch at some point in the future without causing disruption.
Hi again,
We can decide that the image size is not a problem and revert the base image change, and do nothing more about it. (But then why was it a problem a couple weeks ago?)
It was intended to reduce the wire transfer size of the image, and we thought we have been looking at savings of around 100 MB. However, it turned out that we have been looking at the uncompressed sizes. After compression, the net savings are only around 30 MB, and I think that does not satisfy any hassle at all.
We could also introduce
-slim
variants that only contain CrateDB. Some background: I think mid to long-term we could change the JDK bundling to use jlink for a more minimal JRE, and then we could also switch the base image to a minimal glibc image. That would probably allow us to get to about ~100 MB in total.
This is excellent. I am looking forward! And, yes, I also agree that this should probably be converged into corresponding -slim
images.
In either case, we should probably document what is provided by the image.
I've added an item with https://github.com/crate/crate/pull/14027 about the switchover, but we can also introduce a change log on this repository. If you see room for better documentation in this regard, about "what's inside", I will be happy to receive guidance.
With kind regards, Andreas.
We discovered regressions when running the AlmaLinux minimal image on Kubernetes. As a safety measure to not cause any similar problems to users, we are going back to the AlmaLinux standard image, effectively reverting GH-221.
When looking at the size savings, we also found that the compressed image size delta compared between standard and minimal is only 27 MB.
/cc @SStorm, @WalBeh, @proddata, @seut