Closed wangtianxia-sjtu closed 1 year ago
The python will use getrandom
syscall to obtain random numbers in os.urandom
. I have written a simple C program to reproduce the problem.
#include <stdio.h>
#include <stdlib.h>
#include <sys/random.h>
int main(int argc, char **argv) {
if (argc < 2) {
fprintf(stderr, "Usage: %s num_bytes\n", argv[0]);
return 1;
}
int num_bytes = atoi(argv[1]);
unsigned char *buf = malloc(num_bytes);
if (buf == NULL) {
perror("Failed to allocate memory");
return 1;
}
if (getrandom(buf, num_bytes, 0) != num_bytes) {
perror("getrandom failed");
return 1;
}
printf("Random bytes: ");
for (int i = 0; i < num_bytes; i++) {
printf("%02x", buf[i]);
}
printf("\n");
free(buf);
return 0;
}
Compile the program to a.out
and run:
./a.out 5
to generate 5 bytes of random numbers.
The first execution of the command in the firecracker virtual machine will block for about 2 seconds. The successive execution will take only 1 millisecond.
Thanks for reaching out @wangtianxia-sjtu.
I suspect that what you see is expected behaviour. Quoting man 2 getrandom
:
If the urandom source has not yet been initialized, then getrandom() will block, unless GRND_NONBLOCK is specified in flags.
In order to validate my hypothesis, could you:
/proc/sys/kernel/random/entropy_avail
immediately after booting?GRND_NONBLOCK
flag to getrandom
and see if it still blocks on the first call?Some context: At the moment Firecracker does not emulate any entropy device, e.g. virtio-rng. As a result, some times it could be the case that during boot time the guest OS has not collected enough entropy to initialize its PRNGs.
Take a look as well at past related issues:
/proc/sys/kernel/random/entropy_avail
# Boot logs omitted
root@ubuntu-fc-uvm:~# cat /proc/sys/kernel/random/entropy_avail
30
root@ubuntu-fc-uvm:~# time ./a.out 5 # generate 5 random bytes first time
Random bytes: f4f23f8915
real 0m2.241s user 0m0.000s sys 0m1.217s root@ubuntu-fc-uvm:~# cat /proc/sys/kernel/random/entropy_avail 2 root@ubuntu-fc-uvm:~# time ./a.out 5 # generate 5 random bytes second time Random bytes: 028ff90463
real 0m0.001s user 0m0.000s sys 0m0.001s
2. Add `GRND_NONBLOCK` flag to `getrandom` call. The call will never block but will always return an error.
root@ubuntu-fc-uvm:~# time ./a.out 5 getrandom failed: Resource temporarily unavailable
real 0m0.002s user 0m0.002s sys 0m0.000s root@ubuntu-fc-uvm:~# time ./a.out 5 getrandom failed: Resource temporarily unavailable
real 0m0.002s user 0m0.001s sys 0m0.000s
PS: Is there a workaround for this? During `import numpy`, python will depend on a random number from `os.urandom`. This will cause a long latency due to this issue.
Ok, so that indeed is the problem.
At the moment, on x86 you can tell the guest kernel to trust the host's RDRAND: https://github.com/firecracker-microvm/firecracker/issues/663#issuecomment-486174174
Another solution could be to start the rngd daemon early in the boot process: https://github.com/firecracker-microvm/firecracker/issues/663#issuecomment-481849971
Thanks, adding random.trust_cpu=on
to the boot parameters will work.
Describe the bug
When I start a Firecracker virtual machine and run my Python code, the first call to the os.urandom function causes a significant delay. However, subsequent calls to the function have a lower delay.
To Reproduce
Start the firecracker
Expected behaviour
The delay of the invocation of
os.urandom
should not be so high....Environment
Additional context
import numpy
will use a random number fromos.urandom
. This will have a high delay due to the issue.Checks