ARM-software / bsa-acs

Arm SystemReady : BSA Architecture Compliance Suite
Apache License 2.0
16 stars 42 forks source link

System hang occurs during SMMU initialization #12

Closed sunnywang-arm closed 2 years ago

sunnywang-arm commented 2 years ago

@semihalf-bernacki-grzegorz confirmed that this issue can be worked around by the patch smmu.patch.txt from @ndhillonm. In the patch, the timeout-related change is needed for preventing system hang. However, we're still having no idea why adding a __sync_synchronize() call can work around this issue. Can we get a BSA expert to check the patch?

The following is the log showing where system hang occurs.

FS4:\EFI\BOOT\bsa\> Bsa.efi

 BSA Architecture Compliance Suite 
    Version 1.0  

 Starting tests with Print level is  3

 Creating Platform Information Tables 
 PE_INFO: Number of PE detected       :   24 
 GIC_INFO: Number of GICD             :    1 
 GIC_INFO: Number of ITS              :    1 
 TIMER_INFO: Number of system timers  :    0 
 WATCHDOG_INFO: Number of Watchdogs   :    1 
 PCIE_INFO: Number of ECAM regions    :    7 
 PCIE_INFO: Number of RCiEP           :   13 
 PCIE_INFO: Number of RCEC            :    0 
 PCIE_INFO: Number of EP              :   41 
 PCIE_INFO: Number of RP              :   25 
 PCIE_INFO: Number of iEP_EP          :    0 
 PCIE_INFO: Number of iEP_RP          :    0 
 SMMU_INFO: Number of SMMU CTRL       :    1 
chetan-rathore commented 2 years ago

Thanks @semihalf-bernacki-grzegorz and @sunnywang-arm,

We will review the patch and update soon.

ndhillonm commented 2 years ago

The first issue is an error in below function. While loop is based on 'timeout' which is not getting decremented. I think this should be a trivial fix that we can agree on.

static void smmu_cmdq_poll_until_consumed(smmu_dev_t smmu) { uint32_t timeout = SMMU_CMDQ_POLL_TIMEOUT; smmu_cmd_queue_t cmdq = &smmu->cmdq; smmu_queue_t queue = { .log2nent = smmu->cmdq.queue.log2nent, .prod = val_mmio_read((uint64_t)smmu->cmdq.prod_reg), .cons = val_mmio_read((uint64_t)smmu->cmdq.cons_reg) };

while (timeout > 0) {
    if (smmu_queue_empty(&queue))
        break;
    queue.cons = val_mmio_read((uint64_t)cmdq->cons_reg);
}

if (!timeout) {
    val_print(ACS_PRINT_ERR, "\n       CMDQ poll timeout at 0x%08x", queue.prod);
    val_print(ACS_PRINT_ERR, "\n       prod_reg = 0x%08x,",

val_mmio_read((uint64_t)smmu->cmdq.prod_reg)); val_print(ACS_PRINT_ERR, "\n cons_reg = 0x%08x", val_mmio_read((uint64_t)smmu->cmdq.cons_reg)); val_print(ACS_PRINT_ERR, "\n gerror = 0x%08x ", val_mmio_read(smmu->base + SMMU_GERROR_OFFSET)); } }

ndhillonm commented 2 years ago

Second issue is the implementation of 'smmu_cmdq_write_cmd' function. If you look at the SMMU spec,

3.5.2 Queue entry visibility semantics Any producer (whether the SMMU or software) must ensure that if an update to the PROD index value is observable by the consumer, all new queue entries are observable by the consumer

This clearly implies a need for a barrier so that queue entries are visible when PROD register is updated. This can also be seen in implementation of 'arm_smmu_cmdq_issue_cmdlist' function in Linux kernel SMMUv3 driver.

chetan-rathore commented 2 years ago

Hi @semihalf-bernacki-grzegorz,

Thanks for the patch.

We have integrated the timeout change, and for barrier will use DMB instruction.

Can you help us in verifying the fix with below binary https://github.com/chetan-rathore/bsa-acs/blob/main/prebuilt_images/Bsa_ES_Fix.efi

Changes for this issue + #11 -> https://github.com/chetan-rathore/bsa-acs/commit/bd24ef61c168aea663958b96ec852992bb5a4757

timeout change + some debug prints -> https://github.com/chetan-rathore/bsa-acs/commit/a6e8f3bdcd183eb6062c4d8b10d71ff1bc9563b4

Thanks, ACS Team

semihalf-bernacki-grzegorz commented 2 years ago

Hi @chetan-rathore, here is the log from Bsa_ES_Fix.efi: Bsa_ES_Fix.log

chetan-rathore commented 2 years ago

Thanks @semihalf-bernacki-grzegorz,

We will merge the changes into the mainline.

chetan-rathore commented 2 years ago

Hi @semihalf-bernacki-grzegorz @sunnywang-arm ,

The changes are updated as part of #15

sunnywang-arm commented 2 years ago

Closing it as the fix got merged and verified. Thanks, @semihalf-bernacki-grzegor, @ndhillonm, and @chetan-rathore.