Closed vineetgarc closed 3 years ago
The issue is a race condition in exit code path
do_exit
exit_mm
exit_mm_release(current->mm)
mm_release
deactivate_mm <-- RTP0 set to fallback swapped pgd
(since task page tables will be freed later including kernel mapping)
--> IRQ taken
preempt_schedule_irq
context_switch
switch_mm <-- reprograms RTP0 to task’s pgd (loosing the fallback pgd)
switch_to
<-- IRQ resumes in exit_mm (seems like context switch resumes in same task which is a mystery)
mmput
__mmput
exit_mmap(old_mm)
arch_exit_mmap(old_mm)
unmap_vmas
free_pgtables
free_pgd_range <--in-use task pgd table tree is freed, incl kernel mapping
This is NOK but TLB entries keep things going
tlb_finish_mmu
tlb_flush
tlb_flush_mm <-- Nail in the coffin: TLB entries flushed. Kernel can’t execute anymore
To plug the race, we now do this back in arch_exit_mmap() - this time we detect whether this is exevce code path or exit (mm== NULL) and only do this for exit.
Fix pushed ARCv3: mm: machine check with CONFIG_PREEMPT: switch back to arch_exit...
A small race still exists, a fully robust solution will require tinkering with USER_PGTABLES_CEILING.
Stress testing the MMUv6 support (repeated fork/execve/exit) triggers a machine check. This only happens with CONFIG_PREEMPT kernel.