Closed gtrkiller closed 2 years ago
Contents of the juju crashdump: https://drive.google.com/file/d/1EFVbF9xyoxfhoyOgE0ROD6Ki01ap1E9H/view?usp=sharing
Transferred this to bundle-kubeflow, but I don't know if this is the right place either. This feels more general than kubeflow, but not sure where to file this
We believe this issue has been fixed on the image repo server side, so I'm closing this. But if this comes up again, please reopen the issue. Thanks!
Hello, I am using Ubuntu 20.04, Microk8s 1.21 and the latest stable version of juju. This installation is being made on my local machine.
Whenever I try to follow the kubeflow quickstart tutorial (https://charmed-kubeflow.io/docs/quickstart) I can execute all the commands without any further issues. The thing is, when I watch juju status, there are some charms that won't get deployed because of an error pulling the image. I have tried tinkering with many things, but the one thing that changed everything was to change microk8s' DNS servers. The DNS Server that has given me less errors so far has been cloudflare, with google DNS (for example) being a lot more problematic. I will leave attached some screenshots of my machine's resolv.conf, environment file, example pod description (they al have the same error) and two juju status screenshots as well (with Cloudflare & Google DNS in this case) so you can see the difference. All kubeflow installations I tried had different charms failing as well... even with the same configuration. for example, if you see the first Cloudflare juju status screenshot, you can see 4 charms with errors, but on the second cloudflare screenshot (different installation from scratch) there were only two.
Resolv.conf and environment files:
Juju status (google DNS installation):
Juju status (1st cloudflare installation):
Juju status (2nd cloudflare installation):
Example describe pod screenshot:
All the juju status SHs have been taken when the installation gets stuck after several minutes (120+). I have tried to configure a proxy also, but that didn't work. I should note that the image pull back errors you will see in the screenshots are always triggered by a failed size verification (it can be seen on the describe pod SH) and also, I should note that I have tried several installation of all 3 kubeflow bundles, and they all have the same problem for me.
Juju-crashdump does not detect any machines on the kubeflow model.
Result of mtr --report --tcp --port 443 registry.jujucharms.com command: