firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.
http://firecracker-microvm.io
Apache License 2.0
25.09k stars 1.76k forks source link

[Bug] microVM cleanup not triggered in the CI tests using logger thread #2602

Closed dianpopa closed 3 years ago

dianpopa commented 3 years ago

Describe the bug

By default, the instantiation of a microVM object will also create and start a thread that receives as reference the microVM object. This results in the microVM destructor never being called and thus cleanup is skipped.

To Reproduce

./tools/devtool shell -p && cd tests/`
# check the contents of the /srv/ directory (default location for test resources)
du -sh /srv/
pytest integration_tests/functional/test_api.py::test_negative_api_lifecycle
du -sh /srv/*
# 387M  /srv/jailer
# the jailer directory has not gone through the cleanup process

Expected behaviour

The /srv/jailer/firecracker exists but does not contain any microvm leftover resources.

Environment

No specific environment needed.

Additional context

Move the logging thread at the microVM level or one level above it so that we do not need to reference it in the function running the logging thread. This would also help us with being able to check the microVM logs after the destruction of the microVM.

Checks

alindima commented 3 years ago

I've taken a quick look over this. Indeed it appears that the microvm destructors in the test framework are not called, because firecracker pids still show up in ps after the test finished executing.

However, the microvm destructor doesn't have any logic for cleaning up the resources in /srv/jailer/firecracker/. Maybe this should be done, but I think the bigger problem is that the destructors are not being called.

alindima commented 3 years ago

I don't think this has anything to do with the logger thread. I managed to pinpoint the fact that this only happens when we're not using pytest fixtures. In the tests where we leverage fixtures for microvms, the destructor gets called.

In the test outlined above, integration_tests/functional/test_api.py::test_negative_api_lifecycle, we're using a MicrovmBuilder. We're using it in some more tests, especially the performance and snapshot tests.

I'll try to see why the destructor doesn't get called in this case.

dianpopa commented 3 years ago

I've taken a quick look over this. Indeed it appears that the microvm destructors in the test framework are not called, because firecracker pids still show up in ps after the test finished executing.

Unfortunately I omitted this detail from the description of the issue. Indeed, the problem does not replicate for the tests using fixtures since for fixtures we explicitly call vm.kill() which calls stop on the logging thread.

However, the microvm destructor doesn't have any logic for cleaning up the resources in /srv/jailer/firecracker/. Maybe this should be done, but I think the bigger problem is that the destructors are not being called.

The /srv/jailer folder should get cleaned in the destructor of the jailer which should get called when there is no reference to the microVM. You can try and test this by calling spawn with a create_logger=False and then re-run the test_negative_api_lifecycle test.