TritonDataCenter / smartos-live

For more information, please see http://smartos.org/ For any questions that aren't answered there, please join the SmartOS discussion list: https://smartos.topicbox.com/groups/smartos-discuss
1.57k stars 244 forks source link

vmadm.delete_snapshot fails when checkpoints directory does not exist #943

Closed jzinkweg closed 4 years ago

jzinkweg commented 4 years ago

The delete_snapshot command fails when the snapshot is not mounted and the corresponding mountpoint does not exist:

  {
    "name": "vmadm",
    "req_id": "cc49c151-dae9-4e66-b0a0-39f7bced91e3",
    "hostname": "ps-cn-08",
    "pid": 781683,
    "action": "delete_snapshot",
    "vm": "2e0cd896-c79c-45ff-989b-a3e838dd41c7",
    "stack": "vmadm.delete-snapshot",
    "level": 50,
    "err": {
      "message": "Command failed: umount: warning: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not in mnttab\numount: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 no such file or directory\n",
      "name": "Error",
      "stack": "Error: Command failed: umount: warning: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not in mnttab\numount: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 no such file or directory\n\n    at ChildProcess.exithandler (child_process.js:637:15)\n    at ChildProcess.EventEmitter.emit (events.js:98:17)\n    at maybeClose (child_process.js:743:16)\n    at Socket.<anonymous> (child_process.js:956:11)\n    at Socket.EventEmitter.emit (events.js:95:17)\n    at Pipe.close (net.js:468:12)",
      "code": 1,
      "signal": null
    },
    "msg": "There was an error while unmounting the snapshot: Command failed: umount: warning: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not in mnttab\numount: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 no such file or directory\n",
    "time": "2020-06-16T07:54:12.660Z",
    "v": 0
  }

I suspect this situation was created by auto-snapshotting a zone that had reached it's quota, leading to failure to create the /root//checkpoints/1591567260004 directory. The corresponding job failed, but it's just too long ago to have proper logging available:

    {
      "result": "",
      "error": "vmadm.create_snapshot error: vma
[delete_snapshot_failed.txt](https://github.com/joyent/smartos-live/files/4786433/delete_snapshot_failed.txt)
[delete_snapshot_success.txt](https://github.com/joyent/smartos-live/files/4786434/delete_snapshot_success.txt)

dm exited with code: 1 signal: null",
      "name": "cnapi.wait_task",
      "started_at": "2020-06-07T22:01:02.398Z",
      "finished_at": "2020-06-07T22:01:03.130Z"
    }

As a workaround I created the missing directory and was then able to delete the snapshot as the error message changed to "not mounted":

{
  "name": "vmadm",
  "req_id": "c512c86d-8fd4-4032-bed6-645f866315af",
  "hostname": "ps-cn-08",
  "pid": 837060,
  "action": "delete_snapshot",
  "vm": "2e0cd896-c79c-45ff-989b-a3e838dd41c7",
  "stack": "vmadm.delete-snapshot",
  "level": 50,
  "err": {
    "message": "Command failed: umount: warning: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not in mnttab\numount: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not mounted\n",
    "name": "Error",
    "stack": "Error: Command failed: umount: warning: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not in mnttab\numount: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not mounted\n\n    at ChildProcess.exithandler (child_process.js:637:15)\n    at ChildProcess.EventEmitter.emit (events.js:98:17)\n    at maybeClose (child_process.js:743:16)\n    at Socket.<anonymous> (child_process.js:956:11)\n    at Socket.EventEmitter.emit (events.js:95:17)\n    at Pipe.close (net.js:468:12)",
    "code": 1,
    "signal": null
  },
  "msg": "There was an error while unmounting the snapshot: Command failed: umount: warning: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not in mnttab\numount: /zones/2e0cd896-c79c-45ff-989b-a3e838dd41c7/root//checkpoints/1591567260004 not mounted\n",
  "time": "2020-06-16T09:41:40.670Z",
  "v": 0
}
jzinkweg commented 4 years ago

Logs: delete_snapshot_failed.txt delete_snapshot_success.txt