Open shiwenloong opened 5 months ago
Hi @shiwenloong ,
Thanks for reporting this issue.
I have added an experimental branch issues/75_and_78 to print the cudaGetLastError()
.
Please run,
CUDNN_LOGLEVEL_DBG=3 CUDNN_LOGDEST_DBG=backend_api.log CUDNN_FRONTEND_LOG_FILE=stdout CUDNN_FRONTEND_LOG_INFO=1 ./build/bin/samples "MatMul"
and please attach both backend_api.log
and frontend log for us to help debug.
Thanks
I run CUDNN_LOGLEVEL_DBG=3 CUDNN_LOGDEST_DBG=backend_api.log CUDNN_FRONTEND_LOG_FILE=stdout CUDNN_FRONTEND_LOG_INFO=1 ./build/bin/samples "MatMul"
But I can't find the backend_api.log
file. This is the frontend log:
Filters: "MatMul"
Randomness seeded to: 3739293787
[cudnn_frontend] INFO: Validating matmul node GEMM...
[cudnn_frontend] INFO: Inferrencing properties for matmul node GEMM...
[cudnn_frontend] INFO: Creating cudnn tensors for node named 'GEMM':
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["BFLOAT16"] Id: 2 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,32,128 ] Str [ 4096,128,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["BFLOAT16"] Id: 3 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,128,64 ] Str [ 8192,64,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["FLOAT"] Id: 4 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,32,64 ] Str [ 2048,64,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: Building MatmulNode operations GEMM...
[cudnn_frontend] CUDNN_BACKEND_MATMUL_DESCRIPTOR : Math precision ["FLOAT"]
[cudnn_frontend] CUDNN_BACKEND_OPERATIONGRAPH_DESCRIPTOR has 1operations.
Tag: Matmul_
[cudnn_frontend] INFO: Getting plan from heuristics for Matmul_ ...
[cudnn_frontend] CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR :
Heuristic Mode 3 has 6 configurations
[cudnn_frontend] INFO: get_heuristics_list statuses: CUDNN_STATUS_SUCCESS
[cudnn_frontend] INFO: config list has 6 configurations.
[cudnn_frontend] INFO: config list has 6 good configurations.
[cudnn_frontend] INFO: Extracting engine configs.
[cudnn_frontend] INFO: Querying engine config properties
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 0 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 1 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 2 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 3 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 4 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 5 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: plans.check_support(h) at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/graph_interface.h:260
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
samples is a Catch2 v3.3.2 host application.
Run with -? for options
-------------------------------------------------------------------------------
Matmul
-------------------------------------------------------------------------------
/nvme2/medsam/cuda-mode/cudnn-frontend/samples/cpp/matmuls.cpp:31
...............................................................................
/nvme2/medsam/cuda-mode/cudnn-frontend/samples/cpp/matmuls.cpp:80: FAILED:
REQUIRE( graph.check_support(handle).is_good() )
with expansion:
false
===============================================================================
test cases: 1 | 0 passed | 1 failed
assertions: 11 | 10 passed | 1 failed
I encountered a test failure after building and running the tests. Here are the details:
I followed the build instructions as provided in the README:
Output is:
Then I run the matmul test
Output is: