cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.12k stars 605 forks source link

Do not use FS segment register in kernel #1256

Open wkozaczuk opened 1 year ago

wkozaczuk commented 1 year ago

This is a controversial proposal and we may never do it, but I raise it here because it would help us better solve one old problem and another new one.

The old problem is the support of the so-called "local exec thread-local mode" where the executable assumes that it is the first loaded object and therefore offset to its TLS block (relative to fs:0) is known at compile time. The kernel assumes the same and as a result TLS of the executable and the kernel overlap. The issue is described here and has been solved by the kernel leaving a "reservation" for the app at the beginning of the TLS block. This is not ideal as the default reservation may not be enough (see here) and needs to be adjusted by passing the build parameter app_local_exec_tls_size.

The new problem has to do with supporting statically linked executables and dynamically linked ones launched by ld-linux. In this case, such apps would use TLS including the local-exec one and OSv does not really have any control of the bootstrapping mechanism (where the memory is allocated, how much, etc). In the x86_64 case the app would call the arch_prctl syscall to request the FS register to be set to a specific value. But then OSv has to implement extra logic to juggle between the kernel and app value of the FS register on the syscall, interrupt, and page fault switch. This is all doable as I did it my experimental branch but pretty painful and adds extra cost.

So what if we make kernel not to use FS register at all? But how? We could use the GS register (like Linux kernel does), but then what about all the __thread variables the kernel scheduler and many other parts depend on and the compiler automagically handles read and write access to?

Here are all the thread-local variables kernel seems to be using (based on readelf -W -s build/release/loader-stripped.elf | grep TLS:

current_interrupt_frame
errno
memory::alloc_tracker::in_tracker
memory::emergency_alloc_level
osv::override_current_app
osv::thread_pending_signal
osv::thread_signal_mask
percpu_base
pthread_private::cancel_state
pthread_private::cancel_type
pthread_private::current_pthread
pthread_private::tsd
sched::current_cpu
sched::exception_depth
sched::need_reschedule
sched::preempt_counter
sched::s_current
std::__once_call
std::__once_callable