Closed pradeepkumars closed 10 years ago
One thing to note is in the scheduler (see sched.hh, sched.cc), we keep a bitmask "incoming_wakeups_mask" which is a long (8 bytes), so it can only accomodate 64 CPUs, and will not work correctly for more than 64 CPUs. The same 64-cpu limit (see max_cpus in sched.hh) is then carried on to other places in the code (e.g., mempool.cc uses max_cpus).
However, this doesn't explain why 64 cpus don't work - 65 should have been the first one to break. Probably another bug.
By the way, if OSv is known not to support more cpus than some number, we should print a message when starting on too many CPUs - instead of silently hanging.
By mistake closed it.
In the case of 64 vcpus, OSv always hangs at this function call in libc/pthread.cc, called by ramdisk_init: auto t = new pthread(start_routine, arg, sigset, from_libc(attr)); I can't understand why is it so. The first 3 parameters should be same as in the case of 63 vcpus and 4th is NULL.
What I'm seeing is a livelock in a different place: ap_bringup() for the 64th CPU calls c->idle_thread->start(); This start() calls wake() which calls schedule(). This calls handle_incoming_wakeup() and goes into an infinite loop. I'll produce a patch.
Tried different vcpu configurations. VM fails to boot with 64 vcpu. Doesnt drop to shell prompt.
[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 64 OSv v0.05-469-gf99ea60
[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 50 OSv v0.05-469-gf99ea60
[/]% ^C [/]% ^D
[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 64 OSv v0.05-469-gf99ea60
[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 52 OSv v0.05-469-gf99ea60
[/]% ^D
[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 58 OSv v0.05-469-gf99ea60
[/]% ^D
[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 63 OSv v0.05-469-gf99ea60
[/]%
Yet to debug further.