[Note] Running cuDLA-samples on 6.0.6.0

zerollzeng commented 1 year ago

This help users running cuDLA-samples on some old Drive-OS release.

Problem Description

If you see following error when import a external semaphore to cuDLA on 6.0.6.0.

    // Import semaphore to cudla
    memset(&m_WaitEventContext.cudla_ext_sema_mem_desc, 0, sizeof(m_WaitEventContext.cudla_ext_sema_mem_desc));
    m_WaitEventContext.cudla_ext_sema_mem_desc.extSyncObject = m_WaitEventContext.sync_obj;
    m_cudla_err = cudlaImportExternalSemaphore(m_DevHandle, &(m_WaitEventContext.cudla_ext_sema_mem_desc),
                                               &m_WaitEventContext.nvsci_sync_obj_reg_ptr, 0);
    CHECK_CUDLA_ERR(m_cudla_err, "cudla import external semaphore");

Root Cause

cuDLA doesn't support regular system semaphores. cuDLA supports sync point and deterministic semaphores. In 6.0.8, CUDA has changed the priority to prioritize NvSciSyncInternalAttrValPrimitiveType_Syncpoint over NvSciSyncInternalAttrValPrimitiveType_SysmemSemaphore. Hence, NvSciSync object of syncpoint type will be created when CUDA and cuDLA are participating entities as both prefer NvSciSyncInternalAttrValPrimitiveType_Syncpoint over NvSciSyncInternalAttrValPrimitiveType_SysmemSemaphore. But we didn't port the priority change to 6.0.6(previous releases may have the same issue but we didn't test) so it use NvSciSyncInternalAttrValPrimitiveType_SysmemSemaphore by default and cause the error.

Fix info:

NvSciSyncAttrKey_RequireDeterministicFences attribute will create NvSciSync object of determinitic semaphore, please apply the following patch on 6.0.6 manually.

diff --git a/src/cudla_context_standalone.cpp b/src/cudla_context_standalone.cpp
index 0252dac..ceaf7fb 100644
--- a/src/cudla_context_standalone.cpp
+++ b/src/cudla_context_standalone.cpp
@@ -283,6 +283,13 @@ bool cuDLAContextStandalone::initialize()
     CHECK_CUDLA_ERR(m_cudla_err, "get NvSci waiter sync attributes");
     m_cuda_err = cudaDeviceGetNvSciSyncAttributes(m_WaitEventContext.signaler_attr_list, 0, cudaNvSciSyncAttrSignal);
     CHECK_CUDA_ERR(m_cuda_err, "cuda get NvSci signal list");
+
+    memset(m_KeyValue, 0, sizeof(m_KeyValue));
+    keyValue[0].attrKey = NvSciSyncAttrKey_RequireDeterministicFences;
+    keyValue[0].value = (void*)&m_RequireDeterFences;
+    keyValue[0].len = sizeof(m_RequireDeterFences);
+    NvSciSyncAttrListSetAttrs(m_WaitEventContext.signaler_attr_list, keyValue, 1);
+
     NvSciSyncAttrList wait_event_attrs[2] = {m_WaitEventContext.signaler_attr_list, m_WaitEventContext.waiter_attr_list};
     m_nvsci_err = NvSciSyncAttrListReconcile(
         wait_event_attrs,
@@ -330,8 +337,10 @@ bool cuDLAContextStandalone::initialize()
         CUDLA_NVSCISYNC_ATTR_SIGNAL);
     CHECK_CUDLA_ERR(m_cudla_err, "get NvSci sync attributes");
     m_cuda_err = cudaDeviceGetNvSciSyncAttributes(m_SignalEventContext.waiter_attr_list, 0, cudaNvSciSyncAttrWait);
+    NvSciSyncAttrListSetAttrs(m_SignalEventContext.signaler_attr_list, keyValue, 1);
     CHECK_CUDA_ERR(m_cuda_err, "cuda get NvSci wait attribute list");
     NvSciSyncAttrList signal_event_attrs[2] = {m_SignalEventContext.signaler_attr_list, m_SignalEventContext.waiter_attr_list};
+
     m_nvsci_err = NvSciSyncAttrListReconcile(
         signal_event_attrs,
         2,
diff --git a/src/cudla_context_standalone.h b/src/cudla_context_standalone.h
index a002de5..5a9c343 100644
--- a/src/cudla_context_standalone.h
+++ b/src/cudla_context_standalone.h
@@ -123,6 +123,9 @@ class cuDLAContextStandalone
     NvSciSyncContext m_SignalEventContext;
     cudaExternalSemaphore_t m_SignalSem;
     cudaExternalSemaphoreSignalParams m_SignalParams;
+    
+    bool m_RequireDeterFences = true;
+    NvSciSyncAttrKeyValuePair m_KeyValue[1];

     cudlaTask m_Task;

zerollzeng commented 1 year ago

The above patch still have some problems and we are trying to fix it.

yaoshw commented 1 year ago

The above patch still have some problems and we are trying to fix it.

@zerollzeng hello, has the problem with this patch been resolved?

zerollzeng commented 1 year ago

Yes the patch is under review, I'll try to public ASAP.

zerollzeng commented 1 year ago

The patch has been merge in https://github.com/NVIDIA-AI-IOT/cuDLA-samples/commit/a2d645b61920fead0cf70c79506518b0a159463c

Please pull the latest code and rebuild.

NVIDIA-AI-IOT / cuDLA-samples