cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.11k stars 603 forks source link

Vm doesnt boot with 64 vcpu #234

Closed pradeepkumars closed 10 years ago

pradeepkumars commented 10 years ago

Tried different vcpu configurations. VM fails to boot with 64 vcpu. Doesnt drop to shell prompt.

[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 64 OSv v0.05-469-gf99ea60

[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 50 OSv v0.05-469-gf99ea60

[/]% ^C [/]% ^D

[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 64 OSv v0.05-469-gf99ea60

[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 52 OSv v0.05-469-gf99ea60

[/]% ^D

[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 58 OSv v0.05-469-gf99ea60

[/]% ^D

[sp@harrypotter osv]$ ./scripts/run.py -p kvm -c 63 OSv v0.05-469-gf99ea60

[/]%

Yet to debug further.

nyh commented 10 years ago

One thing to note is in the scheduler (see sched.hh, sched.cc), we keep a bitmask "incoming_wakeups_mask" which is a long (8 bytes), so it can only accomodate 64 CPUs, and will not work correctly for more than 64 CPUs. The same 64-cpu limit (see max_cpus in sched.hh) is then carried on to other places in the code (e.g., mempool.cc uses max_cpus).

However, this doesn't explain why 64 cpus don't work - 65 should have been the first one to break. Probably another bug.

By the way, if OSv is known not to support more cpus than some number, we should print a message when starting on too many CPUs - instead of silently hanging.

pradeepkumars commented 10 years ago

By mistake closed it.

jaspal-dhillon commented 10 years ago

In the case of 64 vcpus, OSv always hangs at this function call in libc/pthread.cc, called by ramdisk_init: auto t = new pthread(start_routine, arg, sigset, from_libc(attr)); I can't understand why is it so. The first 3 parameters should be same as in the case of 63 vcpus and 4th is NULL.

nyh commented 10 years ago

What I'm seeing is a livelock in a different place: ap_bringup() for the 64th CPU calls c->idle_thread->start(); This start() calls wake() which calls schedule(). This calls handle_incoming_wakeup() and goes into an infinite loop. I'll produce a patch.