firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.06k stars 1.75k forks source link

Can't resume vm that runs docker in it #4685

Closed BasToTheMax closed 1 week ago

BasToTheMax commented 1 month ago

Hello.

I am having issues while trying to resume my vm.

What I am doing:

Please note: I am running docker in my vm

Snapshot command:

./snapshot-editor edit-memory rebase \
     --memory-path ./snap/mem1 \
     --diff-path ./snap/mem2

Firecracker logs:

2023-12-25T17:24:28.369578016 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:163] The request was executed successfully. Status code: 204 No Content.
2023-12-25T17:24:28.387220678 [anonymous-instance:fc_api:INFO:src/api_server/src/parsed_request.rs:70] The API server received a Put request on "/snapshot/load" with body "{\n            \"snapshot_path\": \"./snap/snap1\",\n            \"mem_file_path\": \"./snap/mem1\",\n            \"enable_diff_snapshots\": true,\n            \"resume_vm\": true\n    }".
2023-12-25T17:24:28.387596935 [anonymous-instance:main:WARN:src/vmm/src/logger/mod.rs:33] [DevPreview] Virtual machine snapshots is in development preview.
2023-12-25T17:24:28.387873552 [anonymous-instance:main:INFO:src/vmm/src/persist.rs:314] Host CPU vendor ID: [71, 101, 110, 117, 105, 110, 101, 73, 110, 116, 101, 108]
2023-12-25T17:24:28.387891803 [anonymous-instance:main:INFO:src/vmm/src/persist.rs:315] Snapshot CPU vendor ID: [71, 101, 110, 117, 105, 110, 101, 73, 110, 116, 101, 108]
2023-12-25T17:24:28.413620267 [anonymous-instance:main:ERROR:src/vmm/src/devices/virtio/queue.rs:296] virtio queue number of available descriptors 4097 is greater than queue max size 256
2023-12-25T17:24:28.413716156 [anonymous-instance:main:INFO:src/vmm/src/lib.rs:818] Vmm is stopping.
2023-12-25T17:24:28.481140691 [anonymous-instance:fc_api:ERROR:src/api_server/src/parsed_request.rs:190] Received Error. Status code: 400 Bad Request. Message: Load snapshot error: Failed to restore from snapshot: Failed to build microVM from snapshot: Failed to restore MMIO device: Cannot restore devices: VirtioBlock(Persist(InvalidInput))
2023-12-25T17:24:28.481173674 [anonymous-instance:fc_api:WARN:src/api_server/src/lib.rs:139] PUT /snapshot/load: mem_file_path field is deprecated.
2023-12-25T17:24:28.481367990 [anonymous-instance:main:ERROR:src/firecracker/src/main.rs:94] RunWithApiError error: Failed to build MicroVM: Loading snapshot failed..
2023-12-25T17:24:28.481410903 [anonymous-instance:main:ERROR:src/firecracker/src/main.rs:97] Firecracker exiting with error. exit_code=1

Host kernel: Linux bttm 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux (uname -a) Guest kernel: vmlinux-5.10.bin

Download script used:

# Some var
ARCH="$(uname -m)"

release_url="https://github.com/firecracker-microvm/firecracker/releases"
latest=$(basename $(curl -fsSLI -o /dev/null -w  %{url_effective} ${release_url}/latest))

curl -L ${release_url}/download/${latest}/firecracker-${latest}-${ARCH}.tgz \
| tar -xz

mv release-${latest}-$(uname -m)/firecracker-${latest}-${ARCH} firecracker
mv release-${latest}-$(uname -m)/snapshot-editor-${latest}-${ARCH} snapshot-editor

rm release-${latest}-$(uname -m) -r

wget https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/${ARCH}/kernels/vmlinux-5.10.bin
mv vmlinux-5.10 kernel

chmod +x ./firecracker
chmod +x ./snapshot-editor

To give more context:

If you need more details, feel free to ask :wink:.

I hope someone can help me fix the issue. I will probably also ask in the slack server.

Originally posted by @BasToTheMax in https://github.com/firecracker-microvm/firecracker/issues/2888#issuecomment-1869049388

I'm currently on vacation and won't be able to do tests.

kalyazin commented 1 month ago

Hi @BasToTheMax ! Thanks for reporting the issue.

From our initial analysis, it looks like the block device fails to restore, because the device layout in memory is not correct.

Could you provide a reproducible test that demonstrates the issue including the following if possible:

Alternatively, we have a test that exercises differential snapshots: https://github.com/firecracker-microvm/firecracker/blob/641b37573c66ff524b73997471da3849a630b634/tests/integration_tests/functional/test_snapshot_basic.py#L128 . You could modify it in the way it is closer to your setup and see if it starts failing (testing readme).

Additionally, is running a docker inside the VM a principal part of the reproduction steps? Does the same sequence not fail without a docker inside?

pb8o commented 3 weeks ago

Hi @BasToTheMax were you able to solve your issue? If not can you provide a series of commands as mentioned in @kalyazin's comment?

kalyazin commented 1 week ago

Closing this for now. @BasToTheMax please feel free to reopen and post your discoveries if you still experience the issue.