Closed sunnywang-arm closed 2 years ago
Thanks @semihalf-bernacki-grzegorz and @sunnywang-arm,
We will review the patch and update soon.
The first issue is an error in below function. While loop is based on 'timeout' which is not getting decremented. I think this should be a trivial fix that we can agree on.
static void smmu_cmdq_poll_until_consumed(smmu_dev_t smmu) { uint32_t timeout = SMMU_CMDQ_POLL_TIMEOUT; smmu_cmd_queue_t cmdq = &smmu->cmdq; smmu_queue_t queue = { .log2nent = smmu->cmdq.queue.log2nent, .prod = val_mmio_read((uint64_t)smmu->cmdq.prod_reg), .cons = val_mmio_read((uint64_t)smmu->cmdq.cons_reg) };
while (timeout > 0) {
if (smmu_queue_empty(&queue))
break;
queue.cons = val_mmio_read((uint64_t)cmdq->cons_reg);
}
if (!timeout) {
val_print(ACS_PRINT_ERR, "\n CMDQ poll timeout at 0x%08x", queue.prod);
val_print(ACS_PRINT_ERR, "\n prod_reg = 0x%08x,",
val_mmio_read((uint64_t)smmu->cmdq.prod_reg)); val_print(ACS_PRINT_ERR, "\n cons_reg = 0x%08x", val_mmio_read((uint64_t)smmu->cmdq.cons_reg)); val_print(ACS_PRINT_ERR, "\n gerror = 0x%08x ", val_mmio_read(smmu->base + SMMU_GERROR_OFFSET)); } }
Second issue is the implementation of 'smmu_cmdq_write_cmd' function. If you look at the SMMU spec,
3.5.2 Queue entry visibility semantics Any producer (whether the SMMU or software) must ensure that if an update to the PROD index value is observable by the consumer, all new queue entries are observable by the consumer
This clearly implies a need for a barrier so that queue entries are visible when PROD register is updated. This can also be seen in implementation of 'arm_smmu_cmdq_issue_cmdlist' function in Linux kernel SMMUv3 driver.
Hi @semihalf-bernacki-grzegorz,
Thanks for the patch.
We have integrated the timeout change, and for barrier will use DMB instruction.
Can you help us in verifying the fix with below binary https://github.com/chetan-rathore/bsa-acs/blob/main/prebuilt_images/Bsa_ES_Fix.efi
Changes for this issue + #11 -> https://github.com/chetan-rathore/bsa-acs/commit/bd24ef61c168aea663958b96ec852992bb5a4757
timeout change + some debug prints -> https://github.com/chetan-rathore/bsa-acs/commit/a6e8f3bdcd183eb6062c4d8b10d71ff1bc9563b4
Thanks, ACS Team
Hi @chetan-rathore, here is the log from Bsa_ES_Fix.efi: Bsa_ES_Fix.log
Thanks @semihalf-bernacki-grzegorz,
We will merge the changes into the mainline.
Hi @semihalf-bernacki-grzegorz @sunnywang-arm ,
The changes are updated as part of #15
Closing it as the fix got merged and verified. Thanks, @semihalf-bernacki-grzegor, @ndhillonm, and @chetan-rathore.
@semihalf-bernacki-grzegorz confirmed that this issue can be worked around by the patch smmu.patch.txt from @ndhillonm. In the patch, the timeout-related change is needed for preventing system hang. However, we're still having no idea why adding a __sync_synchronize() call can work around this issue. Can we get a BSA expert to check the patch?
The following is the log showing where system hang occurs.