Open gabrik opened 6 years ago
Hi @gabrik
You're not misusing the API; unfortunately the version of pylxd you are using is not compatible with the version of lxd you are using. The AttributeError
issue is fixed in pylxd 2.0.7 (for xenial) and 2.2.5+ for bionic.
Which version of lxd are you using (lxd --version
) and which version of pylxd? (pylxd.__version__
)?
Hi @ajkavanagh
I'm using lxd version 3.2 and pylxd 2.2.4 on Xenial I think I have to update to fix this
Yes, I'm pretty sure that 2.2.5 has a fix to this; latest is 2.2.7, but it as an execute
issue that I'm looking into that became much more apparent with lxd 3.0.2. lxd 3.2 has less of an issue, but if you use lots of 'execute' statements from pylxd, then 2.2.6 might be better until I fix the issue and get 2.2.8 out.
Thanks @ajkavanagh Now I see the migration starting but then I got an error on the websockets:
Traceback (most recent call last):
File "migration.py", line 13, in <module>
cont.migrate(client_remote,wait=True)
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/container.py", line 318, in migrate
self.generate_migration_data(), wait=wait)
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/container.py", line 184, in create
client.operations.wait_for_operation(response.json()['operation'])
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/operation.py", line 32, in wait_for_operation
operation.wait()
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/operation.py", line 63, in wait
raise exceptions.LXDAPIException(response)
pylxd.exceptions.LXDAPIException: Error transferring container data: websocket: bad handshake
Both nodes have the other one in the list of the remote, but I don't think this is mandatory
@gabrik The error is actually coming from LXD (i.e. it is being passed through pylxd directly from LXD). However, it could be because pylxd is sending the wrong data, or lxd is not happy with the chatting to the other lxd server.
Can you look at versions on the two servers? Are they mostly compatible?
Thanks @ajkavanagh the version of LXD is the same in both servers (3.2) installed using snap same as pylxd which is version 2.2.6 on both, I can try using different versions of pylxd
In /var/snap/lxd/common/lxd/lxd.log
I see this when I issue the migration
destination node:
ephemeral=false lvl=info msg="Creating container" name=testm t=2018-07-06T12:55:08+0000
ephemeral=false lvl=info msg="Created container" name=testm t=2018-07-06T12:55:08+0000
lvl=warn msg="Unable to update backup.yaml at this time" name=testm t=2018-07-06T12:55:08+0000
lvl=warn msg="Unable to update backup.yaml at this time" name=testm t=2018-07-06T12:55:08+0000
err="websocket: bad handshake" lvl=eror msg="Error during migration sink" t=2018-07-06T12:55:08+0000
created=2018-07-06T12:55:08+0000 ephemeral=false lvl=info msg="Deleting container" name=testm t=2018-07-06T12:55:08+0000 used=1970-01-01T00:00:00+0000
created=2018-07-06T12:55:08+0000 ephemeral=false lvl=info msg="Deleted container" name=testm t=2018-07-06T12:55:08+0000 used=1970-01-01T00:00:00+0000
Nothing is written in the source node
obviously if I try to sniff the traffic between the two nodes to see the messages I cannot get anything because it is encrypted
@ajkavanagh I have found I way to make this work, I have to use the correct ip address also when connecting to the source node, the address should be reachable from the destination node. So I have update my code according to this:
from pylxd import Client
#client_local = Client()
client_local=Client(endpoint='https://192.168.1.104:8443',cert=('lxd.crt','lxd.key'),verify=False)
client_local.authenticate('fog')
client_remote=Client(endpoint='https://192.168.1.106:8443',cert=('lxd.crt','lxd.key'),verify=False)
print('Trusted {}'.format(client_remote.trusted))
client_remote.authenticate('fog')
print('Trusted {}'.format(client_remote.trusted))
cont = client_local.containers.get('testm')
cont.migrate(client_remote,wait=True)
I always got this:
Traceback (most recent call last):
File "migration.py", line 13, in <module>
cont.migrate(client_remote,wait=True)
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/container.py", line 426, in migrate
self.generate_migration_data(), wait=wait)
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/container.py", line 265, in create
client.operations.wait_for_operation(response.json()['operation'])
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/operation.py", line 32, in wait_for_operation
operation.wait()
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/operation.py", line 63, in wait
raise exceptions.LXDAPIException(response)
pylxd.exceptions.LXDAPIException: The container is already running
and the container on the source node remain stopped, I think is because the migration method ends due to the exception above
I think this should be well documented, can I add this example to the documentation?
I think this should be well documented, can I add this example to the documentation?
@gabrik yes, please! Please do submit a PR for any changes/additions to the documentation that you think might be helpful!
@ajkavanagh
Even if the migration works I got this error:
Traceback (most recent call last):
File "migration.py", line 13, in <module>
cont.migrate(client_remote,wait=True)
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/container.py", line 426, in migrate
self.generate_migration_data(), wait=wait)
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/container.py", line 265, in create
client.operations.wait_for_operation(response.json()['operation'])
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/operation.py", line 32, in wait_for_operation
operation.wait()
File "/usr/local/lib/python3.5/dist-packages/pylxd/models/operation.py", line 63, in wait
raise exceptions.LXDAPIException(response)
pylxd.exceptions.LXDAPIException: The container is already running
I think that cames from some reason from here https://github.com/lxc/pylxd/blob/master/pylxd/models/container.py#L425
For what I understand if the container is migrated when is running, the create tries to set it running again and so it returns this error. I am correct?
You think that adding a try...except
can solve this?
I'm closing this as I think it is largely answered/covered by #319. Please feel free to re-open it if there is still an issue NOT related to the actual migration code.
Hi, I'm still facing this issue with LXD 3.0.3 LTS and the latest version of pylxd running in Ubuntu 18.04.
@Oussalvatore Please could you:
a) Indicate the exact version of pylxd b) paste the error message/traceback here please.
Thanks very much.
@ajkavanagh Sorry for the late reply,
a) Since I thought that the new "live" parameter will solve the problem, currently I'm using pylxd==2.2.10.dev17, however I had the same problem with the version 2.2.9, and I'have just tried the newly released version 2.2.10 but the problem still there.
b) 1- The error message is:
Common start logic: The container is already running
2- The execution traceback is:
Traceback (most recent call last):
File "/home/ubuntu/drones-controllers-manager/lxd_manager.py", line 137, in migrate_ca
c.migrate(destination,live=True, wait=True)
File "/usr/local/lib/python3.6/dist-packages/pylxd/models/container.py", line 525, in migrate
raise e
File "/usr/local/lib/python3.6/dist-packages/pylxd/models/container.py", line 519, in migrate
self.generate_migration_data(live), wait=wait)
File "/usr/local/lib/python3.6/dist-packages/pylxd/models/container.py", line 279, in create
client.operations.wait_for_operation(response.json()['operation'])
File "/usr/local/lib/python3.6/dist-packages/pylxd/models/operation.py", line 47, in wait_for_operation
operation.wait()
File "/usr/local/lib/python3.6/dist-packages/pylxd/models/operation.py", line 88, in wait
raise exceptions.LXDAPIException(response)
pylxd.exceptions.LXDAPIException: Common start logic: The container is already running
Also, bellow you'll find the output of "lxc monitor" rununnig in the destination host:
metadata:
context: {}
level: dbug
message: 'New event listener: 94ad2e05-f594-4fbf-840e-1242478114cf'
timestamp: "2019-06-05T08:47:29.999465304Z"
type: logging
metadata:
context:
k: "1"
level: dbug
message: Found cert
timestamp: "2019-06-05T08:47:47.734396336Z"
type: logging
metadata:
context:
ip: 195.148.125.92:46864
method: GET
url: /1.0
level: dbug
message: handling
timestamp: "2019-06-05T08:47:47.734477485Z"
type: logging
metadata:
context:
k: "1"
level: dbug
message: Found cert
timestamp: "2019-06-05T08:47:47.735176399Z"
type: logging
metadata:
context: {}
level: dbug
message: Responding to container create
timestamp: "2019-06-05T08:47:47.945866934Z"
type: logging
metadata:
context:
k: "1"
level: dbug
message: Found cert
timestamp: "2019-06-05T08:47:47.945806315Z"
type: logging
metadata:
context:
ip: 195.148.125.92:46874
method: POST
url: /1.0/containers
level: dbug
message: handling
timestamp: "2019-06-05T08:47:47.945851191Z"
type: logging
metadata:
context: {}
level: dbug
message: No valid storage pool in the container's local root disk device and profiles
found
timestamp: "2019-06-05T08:47:47.947067499Z"
type: logging
metadata:
context: {}
level: dbug
message: 'Database error: &errors.errorString{s:"sql: no rows in result set"}'
timestamp: "2019-06-05T08:47:47.947529002Z"
type: logging
metadata:
context: {}
level: dbug
message: 'Database error: &errors.errorString{s:"sql: no rows in result set"}'
timestamp: "2019-06-05T08:47:47.948167633Z"
type: logging
metadata:
context:
ephemeral: "false"
name: first
level: info
message: Creating container
timestamp: "2019-06-05T08:47:47.984940899Z"
type: logging
metadata:
action: container-updated
source: /1.0/containers/first
timestamp: "2019-06-05T08:47:48.102538242Z"
type: lifecycle
metadata:
action: container-updated
source: /1.0/containers/first
timestamp: "2019-06-05T08:47:48.202111093Z"
type: lifecycle
metadata:
context:
ephemeral: "false"
name: first
level: info
message: Created container
timestamp: "2019-06-05T08:47:48.205872819Z"
type: logging
metadata:
context: {}
level: dbug
message: Creating empty ZFS storage volume for container "first" on storage pool
"default"
timestamp: "2019-06-05T08:47:48.205899131Z"
type: logging
metadata:
action: container-created
source: /1.0/containers/first
timestamp: "2019-06-05T08:47:48.205888015Z"
type: lifecycle
metadata:
context: {}
level: dbug
message: Mounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.608795363Z"
type: logging
metadata:
context: {}
level: dbug
message: Created empty ZFS storage volume for container "first" on storage pool
"default"
timestamp: "2019-06-05T08:47:48.608738233Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.61076933Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.719152991Z"
type: logging
metadata:
action: container-updated
source: /1.0/containers/first
timestamp: "2019-06-05T08:47:48.719140025Z"
type: lifecycle
metadata:
context:
name: first
level: warn
message: Unable to update backup.yaml at this time
timestamp: "2019-06-05T08:47:48.719118048Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.739067287Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.739037258Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.741146888Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.741026052Z"
type: logging
metadata:
context:
name: first
level: warn
message: Unable to update backup.yaml at this time
timestamp: "2019-06-05T08:47:48.741129721Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:48.763349767Z"
type: logging
metadata:
context: {}
level: dbug
message: 'Started task operation: 9c18a13b-31e4-4743-8a1b-498b7a53d88d'
timestamp: "2019-06-05T08:47:48.825045802Z"
type: logging
metadata:
class: task
created_at: "2019-06-05T08:47:48.801416962Z"
description: Creating container
err: ""
id: 9c18a13b-31e4-4743-8a1b-498b7a53d88d
may_cancel: false
metadata: null
resources:
containers:
- /1.0/containers/first
status: Running
status_code: 103
updated_at: "2019-06-05T08:47:48.801416962Z"
timestamp: "2019-06-05T08:47:48.825054065Z"
type: operation
metadata:
class: task
created_at: "2019-06-05T08:47:48.801416962Z"
description: Creating container
err: ""
id: 9c18a13b-31e4-4743-8a1b-498b7a53d88d
may_cancel: false
metadata: null
resources:
containers:
- /1.0/containers/first
status: Pending
status_code: 105
updated_at: "2019-06-05T08:47:48.801416962Z"
timestamp: "2019-06-05T08:47:48.825018853Z"
type: operation
metadata:
context: {}
level: dbug
message: 'New task operation: 9c18a13b-31e4-4743-8a1b-498b7a53d88d'
timestamp: "2019-06-05T08:47:48.824995306Z"
type: logging
metadata:
context:
k: "1"
level: dbug
message: Found cert
timestamp: "2019-06-05T08:47:48.857158296Z"
type: logging
metadata:
context:
ip: 195.148.125.92:46876
method: GET
url: /1.0/operations/9c18a13b-31e4-4743-8a1b-498b7a53d88d
level: dbug
message: handling
timestamp: "2019-06-05T08:47:48.857176342Z"
type: logging
metadata:
context:
ip: 195.148.125.92:46878
method: GET
url: /1.0/operations/9c18a13b-31e4-4743-8a1b-498b7a53d88d/wait
level: dbug
message: handling
timestamp: "2019-06-05T08:47:48.889217113Z"
type: logging
metadata:
context:
k: "1"
level: dbug
message: Found cert
timestamp: "2019-06-05T08:47:48.889199496Z"
type: logging
metadata:
context: {}
level: dbug
message: About to receive rsync
timestamp: "2019-06-05T08:47:48.908377944Z"
type: logging
metadata:
context: {}
level: dbug
message: got message barrier
timestamp: "2019-06-05T08:47:49.337505622Z"
type: logging
metadata:
context: {}
level: dbug
message: sending write barrier
timestamp: "2019-06-05T08:47:49.506205757Z"
type: logging
metadata:
context: {}
level: dbug
message: Got message barrier, resetting stream
timestamp: "2019-06-05T08:47:49.506939152Z"
type: logging
metadata:
context: {}
level: dbug
message: Done receiving from rsync
timestamp: "2019-06-05T08:47:49.506976482Z"
type: logging
metadata:
context: {}
level: dbug
message: About to receive header
timestamp: "2019-06-05T08:47:49.507187591Z"
type: logging
metadata:
context: {}
level: dbug
message: About to receive rsync
timestamp: "2019-06-05T08:47:49.508017479Z"
type: logging
metadata:
context: {}
level: dbug
message: sending write barrier
timestamp: "2019-06-05T08:47:49.645647764Z"
type: logging
metadata:
context: {}
level: dbug
message: Done receiving from rsync
timestamp: "2019-06-05T08:47:49.646216976Z"
type: logging
metadata:
context: {}
level: dbug
message: Got message barrier, resetting stream
timestamp: "2019-06-05T08:47:49.646177408Z"
type: logging
metadata:
context: {}
level: dbug
message: About to receive header
timestamp: "2019-06-05T08:47:49.64640338Z"
type: logging
metadata:
context: {}
level: dbug
message: sending write barrier
timestamp: "2019-06-05T08:47:51.091683528Z"
type: logging
metadata:
context: {}
level: dbug
message: Got message barrier, resetting stream
timestamp: "2019-06-05T08:47:51.092271262Z"
type: logging
metadata:
context: {}
level: dbug
message: got message barrier
timestamp: "2019-06-05T08:47:51.203550825Z"
type: logging
metadata:
context:
actionscript: "false"
created: 2019-06-05 08:47:47 +0000 UTC
ephemeral: "false"
features: "0"
name: first
predumpdir: ""
statedir: /tmp/lxd_restore_918693111
stop: "false"
used: 1970-01-01 00:00:00 +0000 UTC
level: info
message: Migrating container
timestamp: "2019-06-05T08:47:51.638953976Z"
type: logging
metadata:
action: container-updated
source: /1.0/containers/first
timestamp: "2019-06-05T08:47:51.677889197Z"
type: lifecycle
metadata:
context: {}
level: dbug
message: Mounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.679747414Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.679553975Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.682401055Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.798056689Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.831314572Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.833476654Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.83733693Z"
type: logging
metadata:
context: {}
level: dbug
message: Unmounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.930825879Z"
type: logging
metadata:
context:
ip: '@'
method: GET
url: /1.0
level: dbug
message: handling
timestamp: "2019-06-05T08:47:51.951167443Z"
type: logging
metadata:
context:
ip: '@'
method: GET
url: /internal/containers/78/onstart
level: dbug
message: handling
timestamp: "2019-06-05T08:47:51.953908613Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounting ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.957812509Z"
type: logging
metadata:
context: {}
level: dbug
message: Mounted ZFS storage volume for container "first" on storage pool "default"
timestamp: "2019-06-05T08:47:51.96214717Z"
type: logging
metadata:
context: {}
level: dbug
message: 'Scheduler: container first started: re-balancing'
timestamp: "2019-06-05T08:47:52.122993177Z"
type: logging
metadata:
context:
actionscript: "false"
created: 2019-06-05 08:47:47 +0000 UTC
ephemeral: "false"
features: "0"
name: first
predumpdir: ""
statedir: /tmp/lxd_restore_918693111
stop: "false"
used: 1970-01-01 00:00:00 +0000 UTC
level: info
message: Migrated container
timestamp: "2019-06-05T08:47:52.708049284Z"
type: logging
metadata:
action: container-updated
source: /1.0/containers/first
timestamp: "2019-06-05T08:47:52.801985601Z"
type: lifecycle
metadata:
context: {}
level: dbug
message: 'Failure for task operation: 9c18a13b-31e4-4743-8a1b-498b7a53d88d: Common
start logic: The container is already running'
timestamp: "2019-06-05T08:47:52.802075959Z"
type: logging
metadata:
class: task
created_at: "2019-06-05T08:47:48.801416962Z"
description: Creating container
err: 'Common start logic: The container is already running'
id: 9c18a13b-31e4-4743-8a1b-498b7a53d88d
may_cancel: false
metadata: null
resources:
containers:
- /1.0/containers/first
status: Failure
status_code: 400
updated_at: "2019-06-05T08:47:48.801416962Z"
timestamp: "2019-06-05T08:47:52.802094625Z"
type: operation
Thank you :)
I've checked the code, and it's not obvious what the error might be. Which version of LXD are you using (on both source and target). Are they clustered (i.e. via LXD clustering) vs. standalone?
The trace seems to indicate that the container was already started on the target? Is it a naming issue? (i.e. could there have already been a container with the same name on the target)?
@ajkavanagh I'm using LXD 3.0.3 LTS in standalone mode. I'm sure that the container doesn't exist in the target host. It would be great if you can reproduce the error with the LTS version of LXD.
@Oussalvatore thanks for the additional info. I'll have to sort out a lab for testing this (it's re-purposed at present), so any additional debug you can offer would be useful. e.g. does it work with straight lxc commands at the CLI, etc.
Hi everyone,
I'm trying to Live Migrate a very simple container (an empty alpine/edge one) though two different machines If I use the LXD cli interface the migration works pretty well But if I try using pylxd API I got some errors, the code is pretty simple:
I'm using
endpoint
also for localhost otherwise I getThe error that I get is this one:
I'm misusing the API?