Open nobuto-m opened 4 months ago
Hi, @nobuto-m, this is the expected Juju behaviour. See https://github.com/canonical/postgresql-operator/issues/354 and https://bugs.launchpad.net/juju/+bug/2060098 for more details.
I get that Juju doesn't support Juju storage for MAAS/LXD scenario. The point is it was a conscious decision by the charm of introducing Juju storage and it broke a real world use case.
In general I'd argue that using Juju Storage is absolutely the right decision here, though I'm a little surprised Juju doesn't support LXD containers. I'll try to dig in a bit with the Juju team and understand
(One thing you could try in the mean time is use the virt-type constraint to deploy a LXD VM instead? I don't know off the top of my head that it'll work like that, but might be worth exploring)
LXD and storage attributes of a charm should be supported (even when that LXD container is on MAAS), though it is possible that we got something mixed up.
Certainly it does work directly with LXD:
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
controller lxd35 localhost/localhost 3.5.2 unsupported 16:04:27-04:00
App Version Status Scale Charm Channel Rev Exposed Message
controller active 1 juju-controller 3.5/stable 105 no
postgresql 14.11 active 1 postgresql 14/stable 429 no
Unit Workload Agent Machine Public address Ports Message
controller/0* active idle 0 10.139.162.41
postgresql/0* active idle 1 10.139.162.212 5432/tcp Primary
Machine State Address Inst id Base AZ Message
0 started 10.139.162.41 juju-7f8bd3-0 ubuntu@22.04 Running
1 started 10.139.162.212 juju-7f8bd3-1 ubuntu@22.04 Running
$ juju storage
Unit Storage ID Type Pool Size Status Message
postgresql/0 pgdata/0 filesystem rootfs 48 GiB attached
You can see that we are aware of the Postgresql's storage request, but we are fullfilling that request from the "rootfs" pool, which just means use the root disk rather than mounting additional storage.
What Juju doesn't support is pass-through. (provisioning storage as an AWS EBS volume, and getting that mounted into the container, or mounting host devices into the container).
It is plausible that Juju broke something wrt storage provisioning. I can see the same behavior trying to reproduce on AWS.
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
pg-test jam-aws aws/us-east-1 3.5.2 unsupported 16:21:18-04:00
App Version Status Scale Charm Channel Rev Exposed Message
pg2 error 0/1 postgresql 14/stable 429 no cannot assign unit "pg2/0" to machine 1/lxd/0: adding storage to lxd container not supported
Juju should be fine to use rootfs storage on AWS for an LXD container (and same for MAAS). I'm trying to dig a bit more and see where we might have gone wrong.
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
pg-test jam-aws aws/us-east-1 3.5.2 unsupported 16:35:56-04:00
App Version Status Scale Charm Channel Rev Exposed Message
pg1 waiting 1 postgresql 14/stable 429 no waiting to start PostgreSQL
Unit Workload Agent Machine Public address Ports Message
pg1/0* waiting executing 0 54.144.107.165 (leader-elected) waiting to start PostgreSQL
Machine State Address Inst id Base AZ Message
0 started 54.144.107.165 i-0934acabe2f83b8ed ubuntu@22.04 us-east-1c running
$ juju storage
Unit Storage ID Type Pool Size Status Message
pg1/0 pgdata/0 filesystem rootfs 7.6 GiB attached
Which does, indeed, use filesystem storage if I deploy directly to a host and haven't configured storage.
However, it does fail immediately with a container:
$ juju add-unit pg1 --to lxd:0
ERROR acquiring machine to host unit "pg1/1": cannot assign unit "pg1/1" to machine 0/lxd/0: adding storage to lxd container not supported (not supported)
(But absolutely worked with exactly that storage definition when deploying on the LXD provider.)
However, I did go back to a rather old juju (2.9) and it operates exactly the same way:
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
default jam-aws aws/us-east-1 2.9.50 unsupported 17:27:54-04:00
App Version Status Scale Charm Channel Rev Exposed Message
pg1 active 1/2 postgresql 14/stable 429 no
Unit Workload Agent Machine Public address Ports Message
pg1/0* active idle 0 44.222.140.206 5432/tcp Primary
pg1/1 waiting allocating waiting for machine
Machine State Address Inst id Series AZ Message
0 started 44.222.140.206 i-0cd17af46e0376b0e jammy us-east-1a running
0/lxd/0 pending pending jammy
jameinel@jammy:~
$ juju add-unit --to lxd:0 pg1
ERROR acquiring machine to host unit "pg1/2": cannot assign unit "pg1/2" to machine 0/lxd/1: adding storage to lxd container not supported (not supported)
As mentioned, it should be possible to support rootfs storage for applications deployed to containers within another provider, but it looks like we never implemented that support.
To be clear, I'm not questioning about how useful Juju storage is. It's just a known issue that Juju storage support is missing in Juju for MAAS/LXD scenario and employing Juju storage into a charm is known broken for this scenario in the machine charm world (k8s charms are straightforward obviously).
My request is either:
I don't have a clear idea on how much effort is required for each action so I'm leaving this here and wait for the plan by engineering teams.
@taurus-forever i'd be interested to see if the storage limit solves the problem here, at least temporarily.
@wallyworld has taken a look at implementing this and it doesn't seem like a huge amount of work - and could possibly land in 3.6 (even if not in 3.6.0), but let's see if the simple workaround above works before we shuffle Juju's cards around too much.
@jnsgruk , quickly tested --to lxd
:
psql-edge
=> error not supportedmultiple.range: 0-1
as psql-limit01
=> stuck in "waiting for machine"multiple.range: 1
as psql-limit1
(as charm need 1 storage) => error not supportedjuju status
:
ubuntu@juju350:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
pg2404 lxd localhost/localhost 3.5.2 unsupported 11:57:18+02:00
App Version Status Scale Charm Channel Rev Exposed Message
psql-edge error 0/1 postgresql 14/edge 444 no cannot assign unit "psql-edge/0" to machine 4/lxd/0: adding storage to lxd container not supported
psql-limit1 error 0/1 postgresql 1 no cannot assign unit "psql-limit1/0" to machine 6/lxd/0: adding storage to lxd container not supported
psql-limit01 waiting 0/1 postgresql 0 no waiting for machine
Unit Workload Agent Machine Public address Ports Message
psql-edge/0 error lost cannot assign unit "psql-edge/0" to machine 4/lxd/0: adding storage to lxd container not supported
psql-limit1/0 error lost cannot assign unit "psql-limit1/0" to machine 6/lxd/0: adding storage to lxd container not supported
psql-limit01/0 waiting allocating 5/lxd/0 waiting for machine
Machine State Address Inst id Base AZ Message
4 started 10.142.152.170 juju-247c76-4 ubuntu@22.04 Running
4/lxd/0 pending juju-247c76-4-lxd-0 ubuntu@22.04 Container started
5 started 10.142.152.79 juju-247c76-5 ubuntu@22.04 Running
5/lxd/0 pending juju-247c76-5-lxd-0 ubuntu@22.04 Container started
6 started 10.142.152.156 juju-247c76-6 ubuntu@22.04 Running
6/lxd/0 pending juju-247c76-6-lxd-0 ubuntu@22.04 Container started
The limit: 0-1
waiting freeze is constantly reproducible on my side. No useful information in debug-log.
We are happy to test all other possible workarounds here.
IIRC tweaking the storage directives also cause issues with refreshing. If we change the storage definition, we should double-check that upgrades still work.
@dragomirp is referring to https://bugs.launchpad.net/juju/+bug/1995074 mainly:
We are currently removing the description field to be able to refresh the charm
(see: https://github.com/canonical/postgresql-k8s-operator/pull/218).
To clarify, this is going to be an important issue eventually but not blocking any field engagement as far as I'm concerned. Other issues could be prioritized if there is any blocking one.
Thanks @nobuto-m - I think @wallyworld has this on the backlog, with a chance it could land in 3.6 so it's ready for the 3.x LTS.
Just to chime in - it's a 9 year old TODO from the initial implementation of storage support and how storage works with containers. We can look at a fix for 3.6
We will keep this ticket open for a while to re-test 14/stable charm once Juju support nested LXD. The Juju work will happen in https://bugs.launchpad.net/juju/+bug/2060098
This is the juju fix https://github.com/juju/juju/pull/17830 Note - I've raised a new bug https://bugs.launchpad.net/juju/+bug/2074379 just for this specific fix. The other bug is for cloud provided storage like EBS volumes etc which is a much bigger scope.
@wallyworld what is the easiest way to test https://github.com/juju/juju/pull/17830? Wait for juju 3.6-beta2?
Hi @wallyworld ,
I have tried to confirm the fix on Juju 3.6-beta2
(from 3.6/beta) and 3.6-beta3.1
(from 3.6/edge), but both are still not working for me. Was the fix included somewhere?
STR:
juju deploy postgresql --base ubuntu@22.04 --to lxd
Juju status:
ubuntu@juju360:~$ juju status -m test2
Model Controller Cloud/Region Version SLA Timestamp
test2 test localhost/localhost 3.6-beta3.1 unsupported 09:37:40+02:00
App Version Status Scale Charm Channel Rev Exposed Message
postgresql waiting 0/1 postgresql 14/stable 429 no waiting for machine
Unit Workload Agent Machine Public address Ports Message
postgresql/0 waiting allocating 0/lxd/0 waiting for machine
Machine State Address Inst id Base AZ Message
0 started 10.189.210.214 juju-d43457-0 ubuntu@22.04 Running
0/lxd/0 pending juju-d43457-0-lxd-0 ubuntu@22.04 Container started
ubuntu@juju360:~$
Debug-log:
machine-0: 23:11:14 INFO juju.worker.authenticationworker "machine-0" key updater worker started
machine-0: 23:11:14 INFO juju.worker.machiner "machine-0" started
machine-0: 23:11:17 INFO juju.worker.kvmprovisioner machine-0 does not support kvm container
machine-0: 23:11:17 INFO juju.packaging.manager Running: snap info lxd
machine-0: 23:11:18 INFO juju.container.lxd LXD snap is already installed (channel: 5.0/stable/ubuntu-22.04); skipping package installation
machine-0: 23:11:23 INFO juju.container.lxd Availability zone will be empty for this container manager
machine-0: 23:11:23 INFO juju.worker.lxdprovisioner entering provisioner task loop; using provisioner pool with 4 workers
machine-0: 23:11:23 INFO juju.worker.lxdprovisioner found machine pending provisioning id:0/lxd/0, details:0/lxd/0
machine-0: 23:11:23 WARNING juju.container.broker no name servers supplied by provider, using host's name servers.
machine-0: 23:11:23 WARNING juju.container.broker no search domains supplied by provider, using host's search domains.
machine-0: 23:11:23 WARNING juju.container.broker incomplete DNS config found, discovering host's DNS config
machine-0: 23:17:56 INFO juju.cloudconfig Fetching agent: curl -sSf --connect-timeout 20 --noproxy "*" --insecure -o $bin/tools.tar.gz <[https://10.189.210.169:17070/model/00974701-bde1-4940-8cd4-f94546d43457/tools/3.6-beta3.1-ubuntu-amd64]>
machine-0: 23:17:56 INFO juju.container.lxd starting new container "juju-d43457-0-lxd-0" (image "ubuntu-22.04-server-cloudimg-amd64-lxd.tar.xz")
machine-0: 23:18:00 INFO juju.worker.lxdprovisioner started machine 0/lxd/0 as instance juju-d43457-0-lxd-0 with hardware "arch=amd64", network config [], volumes [], volume attachments map[], subnets to zones [], lxd profiles []
machine-0: 23:18:00 INFO juju.worker.instancemutater.container no changes necessary to machine-0/lxd/0 lxd profiles ([default])
controller-0: 23:25:18 INFO juju.worker.instancepoller machine "0" (instance ID "juju-d43457-0") has new addresses: [local-cloud:10.189.210.214@alpha local-cloud:10.218.61.1@alpha local-cloud:fd42:e252:fc0c:db2d::1@alpha]
Tnx!
Model Controller Cloud/Region Version SLA Timestamp test2 test localhost/localhost 3.6-beta3.1 unsupported 09:37:40+02:00
Is it a MAAS provider actually? It looks like localhost LXD and if that's the case isn't it a different issue?
Is it a MAAS provider actually? It looks like localhost LXD and if that's the case isn't it a different issue?
It was a quick test without MAAS, will repeat on MAAS. Tnx for pointing!
The 3.6.0 has finally released! Luckily I had a fresh MAAS boostrapped using this manual.
TL;DR:
juju deploy postgresql
works well there (as earlier)juju deploy postgresql --base ubuntu@22.04 --to lxd
doesn't work for me CC: @wallyworld The originally reported error is no longer exist, but deployment stuck in the middle of nowhere. IMHO, it is identical to my test above with localhost LXD:
The only error I can blame in logs:
machine-0: 14:24:59 ERROR juju.worker.dependency "storage-provisioner" manifold worker returned unexpected error: getting life of filesystem 0/lxd/0/1: permission denied
Versions:
juju 3.6.0 29130 3/stable canonical✓ -
maas 3.5.2-16329-g.a0861379f 37259 3.5/stable canonical✓ -
maas-test-db 14.2-34-g.f09c893 179 3.5/stable canonical✓ -
P.S. Juju Storage is pending (maybe I miss some MAAS storage pool configuration):
> juju storage
Unit Storage ID Type Size Status Message
postgresql/0 pgdata/0 filesystem pending
Need assist of Juju/MAAS team. ping @wallyworld
Using LXD is pretty common in MAAS provider to save the total number of machines instead of assigning a bare metal machine to each unit of micro services.
For example, the official landscape-dense-maas bundle uses that architecture. https://ubuntu.com/landscape/docs/juju-installation#heading--landscape-dense-maas-bundle (the current bundle is using the "legacy" postgresql charm so it works, but it will eventually move away from the legacy one.)
Steps to reproduce
$ juju deploy postgresql --base ubuntu@22.04 --to lxd
Expected behavior
The deployment of the charm gets green.
Actual behavior
It errors out with:
cannot assign unit "postgresql/0" to machine 0/lxd/0: adding storage to lxd container not supported
Versions
Operating system: 22.04 LTS
Juju CLI: 3.5.2-genericlinux-amd64
Juju agent: 3.5.2
Charm revision: 14/stable 429
LXD: 5.0.3
Log output
Juju debug log:
Additional context