CharlyCst / miralis

Miralis is an experimental system that virtualises firmware
https://miralis-firmware.github.io/
MIT License
12 stars 0 forks source link

Linux loop issue with sifive-u54 cpu after booting. #130

Closed NoeTerrier closed 2 months ago

NoeTerrier commented 2 months ago

Using sifive-u54 cpu as the virtual cpu for qemu, linux boots but randomly falls in what appears to be an infinite loop.

To reproduce: checkout commit b7f014fe04fd78294af1ad965ec3a2a7b2dc9fad, add .arg("-cpu").arg("sifive-u54") to the runner's QEMU arguments and run just run linux. (with info log level and sufficient number of exits)

Result: after few tries, it eventually ends up in a loop and finishes by maximal number of exits:

...
[    0.355441] 9pnet: Installing 9P2000 support
[    0.355751] Key type dns_resolver registered
[    0.382373] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
[    0.389111] clk: Disabling unused clocks
[    0.389492] PM: genpd: Disabling unused power domains
[    0.389709] ALSA device list:
[    0.389829]   No soundcards found.
[    0.431572] Freeing unused kernel image (initmem) memory: 3032K
[    0.432497] Run /init as init process

Hello from Linux!

[Error | miralis] Reached maximum number of exits: 99999
error: Recipe `run` failed on line 48 with exit code 1

Expected: Linux boots and power down gracefully:

...
[    0.328417] 9pnet: Installing 9P2000 support
[    0.328751] Key type dns_resolver registered
[    0.352291] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
[    0.358122] clk: Disabling unused clocks
[    0.358497] PM: genpd: Disabling unused power domains
[    0.358718] ALSA device list:
[    0.358846]   No soundcards found.
[    0.395570] Freeing unused kernel image (initmem) memory: 3032K
[    0.396239] Run /init as init process

Hello from Linux!

[    0.450809] reboot: Power down
NoeTerrier commented 2 months ago

I found a workaround with either one of these solutions:

So if the SEIP bit is reset it works, but I don't know why.

NoeTerrier commented 2 months ago

After further investigation, it seems that the SEIP bit of mip is not cleared. Typing to the keyboard seems to produce an external interrupt, which is correctly identified by Linux (exception code 9). Linux traps and the trap handler eventually calls plic_handle_irq. After handling of the interrupt, SEIP bit is still set.

Here is the code of plic_handle_irq in linux/drivers/irqchip/irq-sifive-plic.c.

/*
 * Handling an interrupt is a two-step process: first you claim the interrupt
 * by reading the claim register, then you complete the interrupt by writing
 * that source ID back to the same claim register.  This automatically enables
 * and disables the interrupt, so there's nothing else to do.
 */
static void plic_handle_irq(struct irq_desc *desc)
{
    struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
    struct irq_chip *chip = irq_desc_get_chip(desc);
    void __iomem *claim = handler->hart_base + CONTEXT_CLAIM;
    irq_hw_number_t hwirq;

    WARN_ON_ONCE(!handler->present);

    chained_irq_enter(chip, desc);

    while ((hwirq = readl(claim))) {
        int err = generic_handle_domain_irq(handler->priv->irqdomain,
                            hwirq);
        if (unlikely(err)) {
            dev_warn_ratelimited(handler->priv->dev,
                         "can't find mapping for hwirq %lu\n", hwirq);
        }
    }

    chained_irq_exit(chip, desc);
}

Using gdb and stopping at the point of time when the loop occurs, in the trap handler, and manually clearing SEIP allows the shell to function a bit more before falling into the same loop again.