canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

lxc copy remote:c1 c2 ends up with "error: Error transferring container data: FOREIGN KEY constraint failed" #2293

Closed bigjools closed 8 years ago

bigjools commented 8 years ago

Required information

apicompat: 0
auth: trusted
environment:
  addresses:
  - 192.168.1.111:8443
  - 10.0.3.1:8443
  - 172.17.0.1:8443
  - 192.168.122.1:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIF4DCCA8igAwIBAgIRAOHuSadwLJV5JOBB+DE81VQwDQYJKoZIhvcNAQELBQAw
    NzEcMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEXMBUGA1UEAwwOcm9vdEBj
    YXJib25wYWQwHhcNMTYwNTIwMDg1NDQwWhcNMjYwNTE4MDg1NDQwWjA3MRwwGgYD
    VQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRcwFQYDVQQDDA5yb290QGNhcmJvbnBh
    ZDCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAN0ts1V+ObaHlM6CxnMQ
    3yTth2tkZ53zYnBSfDdGtLLJQzGPiJU0okzvTwDnfx+lHCY2Y90JI1i+tsTBqIsB
    2nabE7rVmw3UIFyU2fqbWlsugS7lYkZWPba7icBSqmLtc5DnyhtkIyHGQMpOpdvs
    sOJMITpiCkHl9S7MMtVzlHoTjad8GqbYVJT+ryCmtXMmSm2ApLMKmTmYHUdTRPAf
    ETCScbVQHTkQPcIYRsJ0iZhn3x7FR6j2Fh2ewYEHSCjI4ebLm9gy4mGEUDH72s1K
    NQE1zoMb/5YHRhYtV2Y0fl90bJzznuH9uDHj+X0Gk+yELYYF4rAHdeapiJfCSU1i
    p8lC/BpLChlje0FQ0YtWG+t8sKqqioV5ZyL2k++GFUJqQW0r+cmx7NxeFUOwQlnD
    p8+dQY1p3MZ3+sUCIejYrnJkujYjWCROfN8gsXn9pbln1xYkHmboHBAsDp92IeGL
    mT5jfM8zSDkeLyVcG7jc1O75Ze0YemRunKKORjN6o7ZsPiwELWwKafpEfR9hW0TX
    hT4/S3YhVgJyodgC4K3+1Bls+fT7c2o3GgNlioKs3GRvH5ddstxMa2Rj6/YPl0RH
    pYJsCYClaPmSOwv+X5/oQ0b5MoP+x4P9aVj3HT952GcfFqxHF9uipPYFhCCojh6T
    TcsGeQSDDRMVrDzviZvVkcm5AgMBAAGjgeYwgeMwDgYDVR0PAQH/BAQDAgWgMBMG
    A1UdJQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwga0GA1UdEQSBpTCBooIJ
    Y2FyYm9ucGFkghAxOTIuMTY4LjEuMTExLzI0ghxmZTgwOjo4NjNhOjRiZmY6ZmVj
    ZDo0ZGRjLzY0ggsxMC4wLjMuMS8yNIIcZmU4MDo6NzBjNTpiMWZmOmZlYTg6MjE0
    YS82NIIQMTkyLjE2OC4xMjIuMS8yNIIcZmU4MDo6MTA0Njo4NGZmOmZlNjg6OWEy
    Yy82NIIKZmU4MDo6MS82NDANBgkqhkiG9w0BAQsFAAOCAgEAXjQwX0gzhbnsXNLm
    3JRR4IKZvP99GGArxddl8gQs37SY0TNEj6NLMxIwlj4Mb69WTs7zN0OcVtcR/9/g
    IUR0qDPuUyUBk1THkOBWymvEYAa5lOFzvgPfVa0xr9m+zCKsz9gmfghN9YoqF6jp
    ZVrizmJsDVQYVYzgnGbuqZhEBFdMtB9VjG172YLa5urcorkDM3wqcary0Q/TJM55
    CxHyoIGu9/52ur5n0t/jsyCfc/bgytejzurfvB8LTHVochbnjn/G29DU77LxB8nV
    SEHTOyuOQkTA8li+0RGqaTcrQNkYFnxBTmPNxCXPKzdmJYZmr5ePLUs6PAfuiPCt
    5jShxYi53L7E8d3n5zfPTSLLIC/k8yC3NxFmp7rJQit3H+AhsAiHoslBU2zL1D6E
    4Htr1VXDlZJZRNEA0X2XLDTpOZLF6wnMumacvQJVeg2iazJKPXJ1FM+uj+ii94pM
    Wqm3dmbofYUPPTlEERmgsLzu7w4zZ5tGANdRafYy/nQpAszHa0KUtUzJxucfy/oO
    dj0EE3U3IUDC8jOAj4wzKEQaUy4Mjt2hl4XVIjqat3DNR9+Um6cPK3uhQmZHXcR9
    P39dwiaSab4udF9Wj4Mt9V/gCBkdaINhC3NKnVnVENxv5xWDBKbbrbvp9Ui3pGBI
    TdFJGngDf973PkaB+9oyf+NLv9I=
    -----END CERTIFICATE-----
  certificatefingerprint: 5c40398b33778fa4b4b934760df88d2ae0aafd385fa7046859acdf406ee12dff
  driver: lxc
  driverversion: 2.0.3
  kernel: Linux
  kernelarchitecture: x86_64
  kernelversion: 4.4.0-34-generic
  server: lxd
  serverpid: 8814
  serverversion: 2.0.3
  storage: dir
  storageversion: ""
config:
  core.https_address: '[::]:8443'
public: false

Issue description

As per the title

Steps to reproduce

Just copy a container from one remote to another.

Information to attach

LXD log shows: t=2016-08-18T13:09:48+1000 lvl=eror msg="Error during migration sink" err="FOREIGN KEY constraint failed"

And that is all you see, apart from comms logging with the remote LXD from a few minutes prior.

stgraber commented 8 years ago

That is, hmm, special. It sure isn't a generalized problem as that very thing is very actively tested in our testsuite and by our users...

The error suggests some kind of collision in the database of the target server.

sqlite3 /var/lib/lxd/lxd.db .dump

Would probably provide enough information to figure out what the problem is and what to do to make it happy again. Figuring out how it go to that state is a whole different problem though...

We used to have a couple of issues around database cleanup that was leading to that sort of issue, but that far predates LXD 2.0 and we've had a CI check in place to ensure that the database is clean after every test run since.

bigjools commented 8 years ago

Ok I think I know how I got into the state because I just did a successful copy with a new target name.

The key - I hit ctrl-c part way through, and then retried. The retry fails with the error.

If you lxc list after ctrl-c, you can see the container listed, and it should not be. This looks like a transactional error.

On 18 August 2016 at 13:33, Stéphane Graber notifications@github.com wrote:

That is, hmm, special. It sure isn't a generalized problem as that very thing is very actively tested in our testsuite and by our users...

The error suggests some kind of collision in the database of the target server.

sqlite3 /var/lib/lxd/lxd.db .dump

Would probably provide enough information to figure out what the problem is and what to do to make it happy again. Figuring out how it go to that state is a whole different problem though...

We used to have a couple of issues around database cleanup that was leading to that sort of issue, but that far predates LXD 2.0 and we've had a CI check in place to ensure that the database is clean after every test run since.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/2293#issuecomment-240615283, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHikc0Nqizqr2ivmheP9pRQy8HSLnkWks5qg9KjgaJpZM4JnGRP .

stgraber commented 8 years ago

Ok, so you actually attempted to initiate a second copy while the first was still going, leading to the collision in the DB, I guess that makes sense.

Right now, pressing ctrl-c at any point in LXD will never cancel the server side operation after it started. That's partly because most server side operation aren't cancel-able (go routines can't be killed...) and also because we don't actually catch ctrl-c in the client in the first place...

bigjools commented 8 years ago

Fair enough. It's a surprising UI scenario though, and apart from dealing with ctrl-c, you should be able to cancel this as it's copying hundreds of megs around, and if you get the wrong one by mistake you can end up eating up bandwidth until it finishes.

On 18 August 2016 at 13:42, Stéphane Graber notifications@github.com wrote:

Ok, so you actually attempted to initiate a second copy while the first was still going, leading to the collision in the DB, I guess that makes sense.

Right now, pressing ctrl-c at any point in LXD will never cancel the server side operation after it started. That's partly because most server side operation aren't cancel-able (go routines can't be killed...) and also because we don't actually catch ctrl-c in the client in the first place...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/2293#issuecomment-240616268, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHikX07zMvdWejPElxIWIVN3ZPfQxPGks5qg9TDgaJpZM4JnGRP .

stgraber commented 8 years ago

Can you try recycling the name that caused the DB error just to confirm that you didn't actually end up with a corrupted DB in the end?

If it looks like it's fine now, then I guess we can close this bug and track the ctrl-c experience in the bug I just filed.

bigjools commented 8 years ago

Yep, I can create new instances with the names that were failing.

Cheers.

On 18 August 2016 at 14:10, Stéphane Graber notifications@github.com wrote:

Can you try recycling the name that caused the DB error just to confirm that you didn't actually end up with a corrupted DB in the end?

If it looks like it's fine now, then I guess we can close this bug and track the ctrl-c experience in the bug I just filed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/2293#issuecomment-240619158, or mute the thread https://github.com/notifications/unsubscribe-auth/ALHikTeavm8WeJtCBO6RfimA0dub3Oahks5qg9tSgaJpZM4JnGRP .