Closed oleid closed 3 years ago
Correction: 2D convolutions don't work anymore for unknown reason. But dense networks are doing fine: (AMD_LOG_LEVEL=1
)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 28, 28, 128) 256
_________________________________________________________________
dense_1 (Dense) (None, 28, 28, 128) 16512
_________________________________________________________________
dense_2 (Dense) (None, 28, 28, 1) 129
=================================================================
Total params: 16,897
Trainable params: 16,897
Non-trainable params: 0
_________________________________________________________________
Epoch 1/30
2020-09-10 11:33:05.511470: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library librocblas.so
:1:hip_code_object.cpp :187 : 11961159632 us: Cannot find the function: Cijk_Ailk_Bljk_SB_MT48x72x32_SE_1LDSB0_APM1_AF0EM1_AF1EM1_AMAS3_ASBE01_ASEM1_BL1_DTL0_DVO0_EPS1_FL0_GRVW2_GSU2_IS
A803_IU1_K1_KLA_LBSPP0_LPA0_LPB0_LDL1_LRVW2_NLCA1_NLCB1_ONLL1_PBD0_PK0_PGR1_PLR1_RK0_SIA1_SU32_SUM0_SUS256_SRVW0_SVW4_SNLL0_TT6_6_TLDS0_USFGROn1_VAW1_VSn1_VW2_WSGRA0_WSGRB0_WG8_12_2_WGM1
:1:hip_code_object.cpp :187 : 11961355426 us: Cannot find the function: Cijk_Alik_Bljk_SB_MT24x24x16_SE_1LDSB0_APM1_AF0EM1_AF1EM1_AMAS0_ASBE01_ASEM1_BL1_DTL0_DVO0_EPS1_FL0_GRVW1_GSU4_IS
A803_IU1_K1_KLA_LBSPP0_LPA0_LPB0_LDL1_LRVW1_NLCA1_NLCB1_ONLL1_PBD0_PK0_PGR1_PLR1_RK0_SIA1_SU32_SUM0_SUS256_SRVW0_SVW4_SNLL0_TT3_4_TLDS0_USFGROn1_VAW1_VSn1_VW1_WSGRA0_WSGRB0_WG8_6_4_WGM1
:1:hip_code_object.cpp :187 : 11961357147 us: Cannot find the function: Cijk_Ailk_Bjlk_SB_MT128x128x8_SE_1LDSB0_APM1_AF0EM1_AF1EM1_AMAS3_ASBE01_ASEM1_BL1_DTL0_DVO0_EPS1_FL0_GRVW4_GSU1_I
SA803_IU1_K1_KLA_LBSPP0_LPA0_LPB0_LDL1_LRVW4_NLCA1_NLCB1_ONLL1_PBD0_PK0_PGR1_PLR1_RK0_SIA1_SU32_SUM0_SUS256_SRVW0_SVW4_SNLL0_TT8_8_TLDS0_USFGROn1_VAW1_VSn1_VW4_WSGRA0_WSGRB0_WG16_16_1_WGM1
1/3750 [..............................] - ETA: 39s - loss: 0.6983 - psnr: 6.3381WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.
3748/3750 [============================>.] - ETA: 0s - loss: 0.0817 - psnr: 27.3319WARNING:tensorflow:Callbacks method `on_test_batch_end` is slow compared to the batch time (batch time: 0.0
3750/3750 [==============================] - 72s 19ms/step - loss: 0.0817 - psnr: 27.3320 - val_loss: 0.0715 - val_psnr: 27.7953
The result of the test suite in miopen (according to rocm-smi and screen flickering the GPU is used :):
$ make check
[...]
Scanning dependencies of target check
Test project /home/oleid/.cache/rua/build/miopen-hip/src/build/test
Start 11: test_conv2d
1/55 Test #11: test_conv2d ...................... Passed 35.59 sec
Start 17: test_dropout
2/55 Test #17: test_dropout ..................... Passed 3.28 sec
Start 27: test_lrn_test
3/55 Test #27: test_lrn_test .................... Passed 1.20 sec
Start 35: test_pooling2d
4/55 Test #35: test_pooling2d ................... Passed 1.09 sec
Start 36: test_pooling3d
5/55 Test #36: test_pooling3d ................... Passed 0.09 sec
Start 40: test_soft_max
6/55 Test #40: test_soft_max .................... Passed 1.09 sec
Start 55: test_conv_igemm_dynamic_small
7/55 Test #55: test_conv_igemm_dynamic_small .... Passed 1.83 sec
Start 1: test_activation
8/55 Test #1: test_activation .................. Passed 0.85 sec
Start 2: test_bn_3d_peract_test
9/55 Test #2: test_bn_3d_peract_test ........... Passed 2.15 sec
Start 3: test_bn_3d_spatial_test
10/55 Test #3: test_bn_3d_spatial_test .......... Passed 2.22 sec
Start 4: test_bn_aux
11/55 Test #4: test_bn_aux ...................... Passed 0.07 sec
Start 5: test_bn_peract_test
12/55 Test #5: test_bn_peract_test .............. Passed 2.04 sec
Start 6: test_bn_spatial_test
13/55 Test #6: test_bn_spatial_test ............. Passed 2.27 sec
Start 7: test_cache
14/55 Test #7: test_cache ....................... Passed 0.09 sec
Start 8: test_cba_inference
15/55 Test #8: test_cba_inference ............... Passed 1.42 sec
Start 9: test_cbna_inference
16/55 Test #9: test_cbna_inference .............. Passed 1.41 sec
Start 10: test_check_numerics_test
17/55 Test #10: test_check_numerics_test ......... Passed 0.74 sec
Start 12: test_conv2d_bias
18/55 Test #12: test_conv2d_bias ................. Passed 0.74 sec
Start 13: test_conv3d
19/55 Test #13: test_conv3d ...................... Passed 35.10 sec
Start 14: test_conv3d_bias
20/55 Test #14: test_conv3d_bias ................. Passed 0.75 sec
Start 15: test_ctc
21/55 Test #15: test_ctc ......................... Passed 1.20 sec
Start 16: test_custom_allocator
22/55 Test #16: test_custom_allocator ............ Passed 0.38 sec
Start 18: test_find_db
23/55 Test #18: test_find_db .....................***Failed 23.48 sec
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
Starting forward find-db test.
MIOpen(HIP): Info [ForwardGetWorkSpaceSize]
MIOpen(HIP): Info2 [HipCompilerVersionImpl] Read version information from HIP package...
MIOpen(HIP): Info [HipCompilerVersionImpl] 3.7.20364
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.6.0.0
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Running: '/opt/rocm/llvm/bin/clang --version'
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] clang version 11.0.0
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Target: x86_64-pc-linux-gnu
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Thread model: posix
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] InstalledDir: /opt/rocm/llvm/bin
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl]
MIOpen(HIP): Info2 [GetFindModeValueImpl] MIOPEN_FIND_MODE = HYBRID(3)
MIOpen(HIP): Info [GetForwardSolutions]
MIOpen(HIP): Info [Measure] Db::Prefetch time: 0.01879 ms
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 192-28-28-5x5-32-26-26-16-1x1-1x1-1x1-0-NCHW-FP32-F in file /tmp/miopen-tmp-f51f-0c93-6672-1e9e/miopen.test.find_db
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.06925 ms
MIOpen(HIP): Info [GetForwardSolutionsFallback] Fallback path, GEMM
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvAsm3x3U: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvAsm1x1U: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvAsm1x1UV2: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvAsm5x10u2v2f1: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvAsm7x7c3h224w224k64u2v2p3q3f1: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvAsm5x10u2v2b1: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvOclDirectFwd11x11: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvOclDirectFwdGen: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvOclDirectFwd3x3: Not applicable
MIOpen(HIP): Info2 [GetWorkspaceSize] ConvOclDirectFwd1x1: Not applicable
MIOpen(HIP): Info2 [GetPerformanceConfig] Returns: 16,16,32,32,2,2,8,2,1
MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file /opt/rocm/miopen/share/miopen/db/miopen.db
MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /home/oleid/.config/miopen/miopen_1.0.0.udb
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmForwardV4R4Xdlops: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R4GenXdlopsFwdFp32: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R4GenFwdXdlops: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmBwdDataV1R1Xdlops: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmBwdDataV4R1Xdlops: Not applicable
MIOpen(HIP): Info [FindSolutionImpl] ConvHipImplicitGemmV4R1Fwd (db access disabled)
MIOpen(HIP): Info [EuristicInit] 16,16,4,2,2,2,2,4,2,4,4,1,16,1,4,16
MIOpen(HIP): Info [GetPerformanceConfigBase] 16,16,4,2,2,2,2,4,2,4,4,1,16,1,4,16
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R1Fwd: Success.
MIOpen(HIP): Info2 [ForwardGetWorkSpaceSize] 12979200
MIOpen(HIP): Info [FindConvFwdAlgorithm] requestAlgoCount = 1, workspace = 12979200
MIOpen(HIP): Info [GetForwardSolutions]
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 192-28-28-5x5-32-26-26-16-1x1-1x1-1x1-0-NCHW-FP32-F in file /tmp/miopen-tmp-f51f-0c93-6672-1e9e/miopen.test.find_db
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.0325 ms
MIOpen(HIP): Info [GetForwardSolutionsFallback] Fallback path, GEMM
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 192-28-28-5x5-32-26-26-16-1x1-1x1-1x1-0-NCHW-FP32-F in file /tmp/miopen-tmp-f51f-0c93-6672-1e9e/miopen.test.find_db
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.0107 ms
MIOpen(HIP): Info [TryLoad] Find-db regenerating.
MIOpen(HIP): Info2 [GetKernels] 0 kernels for key: miopenIm2d2Col "c192i28_28w5_5p1_1s1_1d1_1t1"
MIOpen(HIP): Info2 [AddKernel] Key: miopenIm2Col "c192i28_28w5_5p1_1s1_1d1_1t1"
MIOpen(HIP): Info2 [AddKernelDumpKernelParams] runcl MIOpenIm2d2Col.cl -k Im2d2Col -dumpilisa -r 10 if#0: if#0: if#0: iv#0 196608,1,1/256,1,1 -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=1 -DNUM_IM_BLKS=4 -DLOCAL_MEM_SIZE=432 -DSTRIDE_GT_1=0 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1
MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file
MIOpen(HIP): Info [KernDb] database not present
MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file /home/oleid/.cache/miopen/2.6.0.0/gfx803_32.ukdb
MIOpen(HIP): Info2 [KernDb] Database created successfully
MIOpen(HIP): Info2 [LoadBinary] Loading binary for: MIOpenIm2d2Col.cl ;args: -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=1 -DNUM_IM_BLKS=4 -DLOCAL_MEM_SIZE=432 -DSTRIDE_GT_1=0 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -mcpu=gfx803
MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'MIOpenIm2d2Col.cl.o') AND (kernel_args = ' -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=1 -DNUM_IM_BLKS=4 -DLOCAL_MEM_SIZE=432 -DSTRIDE_GT_1=0 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -mcpu=gfx803');
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.037309 ms
MIOpen(HIP): Info2 [SaveBinary] Saving binary for: MIOpenIm2d2Col.cl ;args: -DNUM_CH_PER_WG=1 -DNUM_IM_BLKS_X=1 -DNUM_IM_BLKS=4 -DLOCAL_MEM_SIZE=432 -DSTRIDE_GT_1=0 -DTILE_SZ_X=32 -DTILE_SZ_Y=8 -DUSE_IM_OFF_GUARD=1 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -mcpu=gfx803
MIOpen(HIP): Info2 [Prepare] INSERT OR REPLACE INTO kern_db(kernel_name, kernel_args, kernel_blob, kernel_hash, uncompressed_size) VALUES(?, ?, ?, ?, ?);
MIOpen(HIP): Info2 [Measure] Db::StoreRecord time: 11.8474 ms
MIOpen(HIP): Info2 [CallGemm] gemm_desc: {isColMajor 0, transA 0, transB 0, m 32, n 676, k 4800, lda 4800, ldb 676, ldc 676, batch_count 1, strideA 0, strideB 0, strideC 0, alpha 1, beta 0, dataType 1}
MIOpen(HIP): Info2 [CallGemm] gemm_desc: {isColMajor 0, transA 0, transB 0, m 32, n 676, k 4800, lda 4800, ldb 676, ldc 676, batch_count 1, strideA 0, strideB 0, strideC 0, alpha 1, beta 0, dataType 1}
MIOpen(HIP): Info2 [dummy_memset] dummy gpu memset
MIOpen(HIP): Info [SetValues] 192-28-28-5x5-32-26-26-16-1x1-1x1-1x1-0-NCHW-FP32-F, content inserted: miopenConvolutionFwdAlgoGEMM:gemm,6.37189,12979200,rocBlas,<unused>
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinograd3x3U: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinogradRxSf3x2: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinogradRxSf2x3: Not applicable
MIOpen(HIP): Info [FindSolutionImpl] ConvBinWinogradRxS (not searchable)
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvBinWinogradRxS: Success.
MIOpen(HIP): Info2 [LoadBinary] Loading binary for: conv_3x3_wheel_alpha_v9_0_15.s ;args: -Wa,-defsym,ROCM_METADATA_VERSION=5 -mcpu=gfx803
MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'conv_3x3_wheel_alpha_v9_0_15.s.o') AND (kernel_args = '-Wa,-defsym,ROCM_METADATA_VERSION=5 -mcpu=gfx803');
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.742803 ms
MIOpen(HIP): Info2 [PrepareInvoker] Preparing kernel: miopenSp3AsmConvRxSU
MIOpen(HIP): Info2 [GetSolution] N=16 C=192 H=28 W=28 K=32 n_groups=32 flags=0 R=5 S=5 pad_H=1 pad_W=1 out_H=26 out_W=26
MIOpen(HIP): Info [EvaluateInvokers] ConvBinWinogradRxS: miopenSp3AsmConvRxSU: 0.684966 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] Selected: ConvBinWinogradRxS: miopenSp3AsmConvRxSU: 0.684966, workspce_sz = 0
MIOpen(HIP): Info [SetValues] 192-28-28-5x5-32-26-26-16-1x1-1x1-1x1-0-NCHW-FP32-F, content inserted: miopenConvolutionFwdAlgoWinograd:ConvBinWinogradRxS,0.684966,0,miopenConvolutionFwdAlgoWinograd,<unused>
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvAsm3x3U: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvAsm1x1U: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvAsm1x1UV2: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvAsm5x10u2v2f1: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvAsm7x7c3h224w224k64u2v2p3q3f1: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvAsm5x10u2v2b1: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvOclDirectFwd11x11: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvOclDirectFwdGen: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvOclDirectFwd3x3: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvOclDirectFwd1x1: Not applicable
MIOpen(HIP): Info2 [GetPerformanceConfig] Returns: 16,16,32,32,2,2,8,2,1
MIOpen(HIP): Info [FindSolutionImpl] ConvOclDirectFwd
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx803' ) AND (num_cu = '32');
MIOpen(HIP): Info2 [impl] [NCHW,FP32,F,2,192,28,28,1,5,5,1,32,16,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx803' ) AND (num_cu = '32');
MIOpen(HIP): Info2 [impl] [NCHW,FP32,F,2,192,28,28,1,5,5,1,32,16,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Measure] Db::Load time: 0.230188 ms
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvOclDirectFwd
MIOpen(HIP): Info2 [GetPerformanceConfig] Returns: 16,16,32,32,2,2,8,2,1
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvOclDirectFwd: Success.
MIOpen(HIP): Info2 [LoadBinary] Loading binary for: MIOpenConvDirUni.cl ;args: -DMLO_HW_WAVE_SZ=64 -DMLO_DIR_FORWARD=1 -DMLO_FILTER_SIZE0=5 -DMLO_FILTER_SIZE1=5 -DMLO_FILTER_PAD0=1 -DMLO_FILTER_PAD1=1 -DMLO_FILTER_STRIDE0=1 -DMLO_FILTER_STRIDE1=1 -DMLO_N_OUTPUTS=32 -DMLO_N_INPUTS=192 -DMLO_BATCH_SZ=16 -DMLO_OUT_WIDTH=26 -DMLO_OUT_HEIGHT=26 -DMLO_OUT_BATCH_STRIDE=21632 -DMLO_OUT_CHANNEL_STRIDE=676 -DMLO_OUT_STRIDE=26 -DMLO_IN_WIDTH=28 -DMLO_IN_HEIGHT=28 -DMLO_IN_BATCH_STRIDE=150528 -DMLO_IN_CHANNEL_STRIDE=784 -DMLO_IN_STRIDE=28 -DMLO_IN_TILE0=32 -DMLO_IN_TILE1=32 -DMLO_GRP_TILE0=16 -DMLO_GRP_TILE1=16 -DMLO_OUT_TILE0=2 -DMLO_OUT_TILE1=2 -DMLO_N_STACKS=1 -DMLO_N_OUT_TILES=8 -DMLO_N_OUT_TILES_PERSTACK=8 -DMLO_N_IN_TILES_PERSTACK=2 -DMLO_N_READ_PROCS=256 -DMLO_ALU_VTILE0=16 -DMLO_ALU_VTILE1=16 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DMLO_CONV_BIAS=0 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -mcpu=gfx803
MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'MIOpenConvDirUni.cl.o') AND (kernel_args = ' -DMLO_HW_WAVE_SZ=64 -DMLO_DIR_FORWARD=1 -DMLO_FILTER_SIZE0=5 -DMLO_FILTER_SIZE1=5 -DMLO_FILTER_PAD0=1 -DMLO_FILTER_PAD1=1 -DMLO_FILTER_STRIDE0=1 -DMLO_FILTER_STRIDE1=1 -DMLO_N_OUTPUTS=32 -DMLO_N_INPUTS=192 -DMLO_BATCH_SZ=16 -DMLO_OUT_WIDTH=26 -DMLO_OUT_HEIGHT=26 -DMLO_OUT_BATCH_STRIDE=21632 -DMLO_OUT_CHANNEL_STRIDE=676 -DMLO_OUT_STRIDE=26 -DMLO_IN_WIDTH=28 -DMLO_IN_HEIGHT=28 -DMLO_IN_BATCH_STRIDE=150528 -DMLO_IN_CHANNEL_STRIDE=784 -DMLO_IN_STRIDE=28 -DMLO_IN_TILE0=32 -DMLO_IN_TILE1=32 -DMLO_GRP_TILE0=16 -DMLO_GRP_TILE1=16 -DMLO_OUT_TILE0=2 -DMLO_OUT_TILE1=2 -DMLO_N_STACKS=1 -DMLO_N_OUT_TILES=8 -DMLO_N_OUT_TILES_PERSTACK=8 -DMLO_N_IN_TILES_PERSTACK=2 -DMLO_N_READ_PROCS=256 -DMLO_ALU_VTILE0=16 -DMLO_ALU_VTILE1=16 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DMLO_CONV_BIAS=0 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -mcpu=gfx803');
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.040689 ms
MIOpen(HIP): Info2 [SaveBinary] Saving binary for: MIOpenConvDirUni.cl ;args: -DMLO_HW_WAVE_SZ=64 -DMLO_DIR_FORWARD=1 -DMLO_FILTER_SIZE0=5 -DMLO_FILTER_SIZE1=5 -DMLO_FILTER_PAD0=1 -DMLO_FILTER_PAD1=1 -DMLO_FILTER_STRIDE0=1 -DMLO_FILTER_STRIDE1=1 -DMLO_N_OUTPUTS=32 -DMLO_N_INPUTS=192 -DMLO_BATCH_SZ=16 -DMLO_OUT_WIDTH=26 -DMLO_OUT_HEIGHT=26 -DMLO_OUT_BATCH_STRIDE=21632 -DMLO_OUT_CHANNEL_STRIDE=676 -DMLO_OUT_STRIDE=26 -DMLO_IN_WIDTH=28 -DMLO_IN_HEIGHT=28 -DMLO_IN_BATCH_STRIDE=150528 -DMLO_IN_CHANNEL_STRIDE=784 -DMLO_IN_STRIDE=28 -DMLO_IN_TILE0=32 -DMLO_IN_TILE1=32 -DMLO_GRP_TILE0=16 -DMLO_GRP_TILE1=16 -DMLO_OUT_TILE0=2 -DMLO_OUT_TILE1=2 -DMLO_N_STACKS=1 -DMLO_N_OUT_TILES=8 -DMLO_N_OUT_TILES_PERSTACK=8 -DMLO_N_IN_TILES_PERSTACK=2 -DMLO_N_READ_PROCS=256 -DMLO_ALU_VTILE0=16 -DMLO_ALU_VTILE1=16 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DMLO_CONV_BIAS=0 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -mcpu=gfx803
MIOpen(HIP): Info2 [Prepare] INSERT OR REPLACE INTO kern_db(kernel_name, kernel_args, kernel_blob, kernel_hash, uncompressed_size) VALUES(?, ?, ?, ?, ?);
MIOpen(HIP): Info2 [Measure] Db::StoreRecord time: 13.6306 ms
MIOpen(HIP): Info2 [PrepareInvoker] Preparing kernel: MIOpenConvUni
MIOpen(HIP): Info [EvaluateInvokers] ConvOclDirectFwd: MIOpenConvUni: 2.30946 < 3.40282e+38
MIOpen(HIP): Info [EvaluateInvokers] Selected: ConvOclDirectFwd: MIOpenConvUni: 2.30946, workspce_sz = 0
MIOpen(HIP): Info [SetValues] 192-28-28-5x5-32-26-26-16-1x1-1x1-1x1-0-NCHW-FP32-F, content inserted: miopenConvolutionFwdAlgoDirect:ConvOclDirectFwd,2.30946,0,miopenConvolutionFwdAlgoDirect,<unused>
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmForwardV4R4Xdlops: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R4GenXdlopsFwdFp32: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R4GenFwdXdlops: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmBwdDataV1R1Xdlops: Not applicable
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmBwdDataV4R1Xdlops: Not applicable
MIOpen(HIP): Info [FindSolutionImpl] ConvHipImplicitGemmV4R1Fwd
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx803' ) AND (num_cu = '32');
MIOpen(HIP): Info2 [impl] [NCHW,FP32,F,2,192,28,28,1,5,5,1,32,16,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Prepare] SELECT solver, params FROM perf_db INNER JOIN config ON perf_db.config = config.id WHERE ( (layout = ? ) AND (data_type = ? ) AND (direction = ? ) AND (spatial_dim = ? ) AND (in_channels = ? ) AND (in_h = ? ) AND (in_w = ? ) AND (in_d = ? ) AND (fil_h = ? ) AND (fil_w = ? ) AND (fil_d = ? ) AND (out_channels = ? ) AND (batchsize = ? ) AND (pad_h = ? ) AND (pad_w = ? ) AND (pad_d = ? ) AND (conv_stride_h = ? ) AND (conv_stride_w = ? ) AND (conv_stride_d = ? ) AND (dilation_h = ? ) AND (dilation_w = ? ) AND (dilation_d = ? ) AND (bias = ? ) AND (group_count = ? ) )AND (arch = 'gfx803' ) AND (num_cu = '32');
MIOpen(HIP): Info2 [impl] [NCHW,FP32,F,2,192,28,28,1,5,5,1,32,16,1,1,0,1,1,0,1,1,0,0,1]
MIOpen(HIP): Info2 [Measure] Db::Load time: 0.233658 ms
MIOpen(HIP): Info [FindSolutionImpl] Perf Db: record not found for: ConvHipImplicitGemmV4R1Fwd
MIOpen(HIP): Info [EuristicInit] 16,16,4,2,2,2,2,4,2,4,4,1,16,1,4,16
MIOpen(HIP): Info [GetPerformanceConfigBase] 16,16,4,2,2,2,2,4,2,4,4,1,16,1,4,16
MIOpen(HIP): Info2 [SearchForAllSolutions] ConvHipImplicitGemmV4R1Fwd: Success.
MIOpen(HIP): Info2 [LoadBinary] Loading binary for: gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp ;args: -std=c++14 -DCK_PARAM_PROBLEM_N=16 -DCK_PARAM_PROBLEM_K=32 -DCK_PARAM_PROBLEM_C=192 -DCK_PARAM_PROBLEM_HI=28 -DCK_PARAM_PROBLEM_WI=28 -DCK_PARAM_PROBLEM_HO=26 -DCK_PARAM_PROBLEM_WO=26 -DCK_PARAM_PROBLEM_Y=5 -DCK_PARAM_PROBLEM_X=5 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_FORWARD=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_DATA=0 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_WEIGHT=0 -DCK_PARAM_TUNABLE_BLOCK_SIZE=64 -DCK_PARAM_TUNABLE_B_PER_BLOCK=16 -DCK_PARAM_TUNABLE_K_PER_BLOCK=16 -DCK_PARAM_TUNABLE_E_PER_BLOCK=4 -DCK_PARAM_DEPENDENT_GRID_SIZE=338 -DCK_PARAM_GEMM_N_REPEAT=2 -DCK_PARAM_GEMM_M_PER_THREAD_SUB_C=2 -DCK_PARAM_GEMM_N_PER_THREAD_SUB_C=2 -DCK_PARAM_GEMM_M_LEVEL0_CLUSTER=2 -DCK_PARAM_GEMM_N_LEVEL0_CLUSTER=4 -DCK_PARAM_GEMM_M_LEVEL1_CLUSTER=2 -DCK_PARAM_GEMM_N_LEVEL1_CLUSTER=4 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_E=4 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N1=1 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_B=16 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N2=1 -DCK_PARAM_IN_BLOCK_COPY_SRC_DATA_PER_READ_B=1 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_E=4 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_K=16 -DCK_PARAM_WEI_BLOCK_COPY_SRC_DATA_PER_READ_E=1 -DCK_PARAM_EPACK_LENGTH=1 -DCK_THREADWISE_GEMM_USE_AMD_INLINE_ASM=0 -DCK_USE_AMD_INLINE_ASM=0 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_IN_BLOCK_COPY_DST_DATA_PER_WRITE_N2=2 -DCK_PARAM_WEI_BLOCK_COPY_DST_DATA_PER_WRITE_K=1 -mcpu=gfx803
MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'gridwise_convolution_implicit_gemm_v4r1_nchw_kcyx_nkhw_lds_double_buffer.cpp.o') AND (kernel_args = ' -std=c++14 -DCK_PARAM_PROBLEM_N=16 -DCK_PARAM_PROBLEM_K=32 -DCK_PARAM_PROBLEM_C=192 -DCK_PARAM_PROBLEM_HI=28 -DCK_PARAM_PROBLEM_WI=28 -DCK_PARAM_PROBLEM_HO=26 -DCK_PARAM_PROBLEM_WO=26 -DCK_PARAM_PROBLEM_Y=5 -DCK_PARAM_PROBLEM_X=5 -DCK_PARAM_PROBLEM_CONV_STRIDE_H=1 -DCK_PARAM_PROBLEM_CONV_STRIDE_W=1 -DCK_PARAM_PROBLEM_CONV_DILATION_H=1 -DCK_PARAM_PROBLEM_CONV_DILATION_W=1 -DCK_PARAM_PROBLEM_LEFT_PAD_H=1 -DCK_PARAM_PROBLEM_LEFT_PAD_W=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_H=1 -DCK_PARAM_PROBLEM_RIGHT_PAD_W=1 -DCK_PARAM_PROBLEM_CONV_GROUP_COUNTS=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_FORWARD=1 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_DATA=0 -DCK_PARAM_PROBLEM_CONV_DIRECTION_BACKWARD_WEIGHT=0 -DCK_PARAM_TUNABLE_BLOCK_SIZE=64 -DCK_PARAM_TUNABLE_B_PER_BLOCK=16 -DCK_PARAM_TUNABLE_K_PER_BLOCK=16 -DCK_PARAM_TUNABLE_E_PER_BLOCK=4 -DCK_PARAM_DEPENDENT_GRID_SIZE=338 -DCK_PARAM_GEMM_N_REPEAT=2 -DCK_PARAM_GEMM_M_PER_THREAD_SUB_C=2 -DCK_PARAM_GEMM_N_PER_THREAD_SUB_C=2 -DCK_PARAM_GEMM_M_LEVEL0_CLUSTER=2 -DCK_PARAM_GEMM_N_LEVEL0_CLUSTER=4 -DCK_PARAM_GEMM_M_LEVEL1_CLUSTER=2 -DCK_PARAM_GEMM_N_LEVEL1_CLUSTER=4 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_E=4 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N1=1 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_B=16 -DCK_PARAM_IN_BLOCK_COPY_CLUSTER_LENGTHS_N2=1 -DCK_PARAM_IN_BLOCK_COPY_SRC_DATA_PER_READ_B=1 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_E=4 -DCK_PARAM_WEI_BLOCK_COPY_CLUSTER_LENGTHS_K=16 -DCK_PARAM_WEI_BLOCK_COPY_SRC_DATA_PER_READ_E=1 -DCK_PARAM_EPACK_LENGTH=1 -DCK_THREADWISE_GEMM_USE_AMD_INLINE_ASM=0 -DCK_USE_AMD_INLINE_ASM=0 -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DCK_PARAM_IN_BLOCK_COPY_DST_DATA_PER_WRITE_N2=2 -DCK_PARAM_WEI_BLOCK_COPY_DST_DATA_PER_WRITE_K=1 -mcpu=gfx803');
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.044059 ms
terminate called after throwing an instance of 'miopen::Exception'
what(): /home/oleid/.cache/rua/build/miopen-hip/src/MIOpen-rocm-3.7.0/src/hipoc/hipoc_program.cpp:94: Failed creating module hipErrorSharedObjectInitFailed
CMake Error at test_test_find_db.cmake:7 (message):
Test failed
Start 19: test_fusion_aux
24/55 Test #19: test_fusion_aux .................. Passed 0.08 sec
Start 20: test_gru
25/55 Test #20: test_gru .........................***Failed Error regular expression found in output. Regex=[FAILED] 11.58 sec
Empty batch sequence. Filling uniformly with batch size: 17
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_gru --float --batch-size 17 --seq-len 2 --vector-len 13 --hidden-size 67 --num-layers 1 --use-dropout 0 --in-mode 0 --bias-mode 0 --dir-mode 0 --batch-seq 17 17
FAILED: 0.0320774
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17 -m gru -k 2 -H 67 -W 13 -l 1 -F 0 -r 0 -b 0 -p 0
inputMode: 0 biasMode: 0 dirMode: 0
hz: 67 batch_n: 34 seqLength: 2 inputLen: 13 numLayers: 1 useDropout: 0
Backward Data GRU:
Output dx failed verification.
Max diff: 0.017266
Mismatch at 1: -0.00755944 != -0.00755943
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_gru --float --batch-size 17 --seq-len 2 --vector-len 13 --hidden-size 67 --num-layers 1 --use-dropout 0 --in-mode 0 --bias-mode 0 --dir-mode 0 --batch-seq 17 17
FAILED: 0.00856887
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17 -m gru -k 2 -H 67 -W 13 -l 1 -F 0 -r 0 -b 0 -p 0
inputMode: 0 biasMode: 0 dirMode: 0
hz: 67 batch_n: 34 seqLength: 2 inputLen: 13 numLayers: 1 useDropout: 0
Backward Data GRU:
Max diff: 0.0487628
Mismatch at 2: 0.00140701 != 0.00140701
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_gru --float --batch-size 17 --seq-len 2 --vector-len 13 --hidden-size 67 --num-layers 1 --use-dropout 0 --in-mode 0 --bias-mode 0 --dir-mode 0 --batch-seq 17 17
FAILED: 0.0545153
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17 -m gru -k 2 -H 67 -W 13 -l 1 -F 0 -r 0 -b 0 -p 0
inputMode: 0 biasMode: 0 dirMode: 0
hz: 67 batch_n: 34 seqLength: 2 inputLen: 13 numLayers: 1 useDropout: 0
Backward Weights GRU:
Max diff: 0.507131
Mismatch at 48: 0.00114732 != 0.50561
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_gru --float --batch-size 17 --seq-len 2 --vector-len 13 --hidden-size 67 --num-layers 1 --use-dropout 0 --in-mode 0 --bias-mode 0 --dir-mode 0 --batch-seq 17 17
FAILED: 0.107456
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17 -m gru -k 2 -H 67 -W 13 -l 1 -F 0 -r 0 -b 0 -p 0
inputMode: 0 biasMode: 0 dirMode: 0
hz: 67 batch_n: 34 seqLength: 2 inputLen: 13 numLayers: 1
Forward Inference GRU:
Output tensor output failed verification.
Max diff: 0.0694702
Mismatch at 0: 0.00283208 != 0.0151487
Start 21: test_gru_dropout
26/55 Test #21: test_gru_dropout .................***Failed Error regular expression found in output. Regex=[FAILED] 10.71 sec
Empty batch sequence. Filling uniformly with batch size: 17
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_gru_dropout --float --batch-size 17 --seq-len 23 --vector-len 13 --hidden-size 67 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 0 --dir-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5
FAILED: 0.00454052
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5 -m gru -k 23 -H 67 -W 13 -l 3 -F 0 -r 0 -b 0 -p 0
inputMode: 0 biasMode: 0 dirMode: 0
hz: 67 batch_n: 260 seqLength: 23 inputLen: 13 numLayers: 3 useDropout: 1
Backward Data GRU:
Output dx failed verification.
Max diff: 0.00322931
Mismatch at 0: 0.000931848 != 0.00093185
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_gru_dropout --float --batch-size 17 --seq-len 23 --vector-len 13 --hidden-size 67 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 0 --dir-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5
FAILED: 4.29788e-05
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5 -m gru -k 23 -H 67 -W 13 -l 3 -F 0 -r 0 -b 0 -p 0
inputMode: 0 biasMode: 0 dirMode: 0
hz: 67 batch_n: 260 seqLength: 23 inputLen: 13 numLayers: 3 useDropout: 1
Backward Data GRU:
Hidden state dhx tensor failed verification.
Max diff: 0.000362066
Mismatch at 0: -0.0204745 != -0.0204745
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_gru_dropout --float --batch-size 17 --seq-len 23 --vector-len 13 --hidden-size 67 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 0 --dir-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5
FAILED: 1.34913e-05
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5 -m gru -k 23 -H 67 -W 13 -l 3 -F 0 -r 0 -b 0 -p 0
inputMode: 0 biasMode: 0 dirMode: 0
hz: 67 batch_n: 260 seqLength: 23 inputLen: 13 numLayers: 3 useDropout: 1
Backward Data GRU:
Max diff: 0.00129145
Mismatch at 0: -0.000566879 != -0.000566879
Start 22: test_handle_test
27/55 Test #22: test_handle_test ................. Passed 1.09 sec
Start 23: test_immed_conv2d
28/55 Test #23: test_immed_conv2d ................ Passed 0.07 sec
Start 24: test_immed_conv3d
29/55 Test #24: test_immed_conv3d ................ Passed 7.70 sec
Start 25: test_include_inliner
30/55 Test #25: test_include_inliner ............. Passed 0.10 sec
Start 26: test_kernel_build_params
31/55 Test #26: test_kernel_build_params ......... Passed 0.07 sec
Start 28: test_lstm
32/55 Test #28: test_lstm ........................ Passed 8.55 sec
Start 29: test_lstm_dropout
33/55 Test #29: test_lstm_dropout ................***Failed Error regular expression found in output. Regex=[FAILED] 9.47 sec
Empty batch sequence. Filling uniformly with batch size: 17
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 0.0148069
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Forward Train LSTM:
Output tensor output failed verification.
Max diff: 0.0296305
Mismatch at 0: -0.00388325 != -0.00388324
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 0.00364173
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Forward Train LSTM:
Hidden state tensor failed verification.
Max diff: 0.0058289
Mismatch at 1: 0.0453552 != 0.0453552
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 0.00369584
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Forward Train LSTM:
Cell state tensor failed verification.
Max diff: 0.010714
Mismatch at 0: 0.0155845 != 0.0155845
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 0.000112727
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Backward Data LSTM:
Output dx failed verification.
Max diff: 0.000107641
Mismatch at 0: 0.00030012 != 0.000300088
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 7.58906e-05
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Backward Data LSTM:
Hidden state dhx tensor failed verification.
Max diff: 4.07901e-05
Mismatch at 0: 0.000162567 != 0.000162645
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 0.000162781
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Backward Data LSTM:
Hidden cell dcx tensor failed verification.
Max diff: 0.000141347
Mismatch at 0: -0.000447281 != -0.000447214
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 0.000338308
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Backward Data LSTM:
Workspace space tensor failed verification.
Max diff: 0.00392286
Mismatch at 0: 5.63776e-06 != 5.63691e-06
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_lstm_dropout --float --batch-size 17 --seq-len 25 --vector-len 17 --hidden-size 67 --num-layers 3 --no-hx 0 --no-dhy 0 --no-cx 0 --no-dcy 0 --no-hy 0 --no-dhx 0 --no-cy 0 --no-dcx 0 --flat-batch-fill 0 --use-dropout 1 --in-mode 0 --bias-mode 1 --dir-mode 0 --algo-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4
FAILED: 6.58243e-05
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4 -m lstm -k 25 -H 67 -W 17 -l 3 -F 0 -r 0 -b 1 -p 0
inputMode: 0 biasMode: 1 dirMode: 0
hz: 67 batch_n: 268 seqLength: 25 inputLen: 17 numLayers: 3 useDropout: 1
Backward Weights LSTM:
Max diff: 0.0102975
Mismatch at 299: -0.000965326 != -0.000965326
Start 30: test_main
34/55 Test #30: test_main ........................***Failed Error regular expression found in output. Regex=[FAILED] 24.77 sec
MIOpen(HIP): Warning [Prefetch] File is unreadable: /opt/rocm/miopen/share/miopen/db/gfx803_32.HIP.fdb.txt
FAILED: (miopenFindConvolutionForwardAlgorithm( handle, inputTensor, in_dev, convFilter, wei_dev, convDesc, outputTensor, out_dev, 1, &ret_algo_count, &perf, fwd_workspace_dev, sz_fwd_workspace, 0)) == 0: /home/oleid/.cache/rua/build/miopen-hip/src/MIOpen-rocm-3.7.0/test/main.cpp: 282
MIOpen Error: /home/oleid/.cache/rua/build/miopen-hip/src/MIOpen-rocm-3.7.0/src/hipoc/hipoc_program.cpp:94: Failed creating module hipErrorSharedObjectInitFailed
CMake Error at test_test_main.cmake:7 (message):
Test failed
Start 31: test_mdgraph
35/55 Test #31: test_mdgraph ..................... Passed 0.41 sec
Start 32: test_na_inference
36/55 Test #32: test_na_inference ................ Passed 0.81 sec
Start 33: test_na_train
37/55 Test #33: test_na_train .................... Passed 1.30 sec
Start 34: test_perfdb
38/55 Test #34: test_perfdb ...................... Passed 0.68 sec
Start 37: test_rnn_vanilla
39/55 Test #37: test_rnn_vanilla ................. Passed 9.20 sec
Start 38: test_rnn_vanilla_dropout
40/55 Test #38: test_rnn_vanilla_dropout .........***Failed Error regular expression found in output. Regex=[FAILED] 10.63 sec
Empty batch sequence. Filling uniformly with batch size: 17
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_rnn_vanilla_dropout --float --batch-size 17 --seq-len 51 --vector-len 31 --hidden-size 127 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 1 --bias-mode 1 --dir-mode 0 --rnn-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
FAILED: 0.0134763
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 -m relu -k 51 -H 127 -W 127 -l 3 -F 0 -r 0 -b 1 -p 1 -U 1
Forward Train RNN vanilla:
Output tensor output failed verification.
Max diff: 0.213538
Mismatch at 17: 0.0560553 != 0.0560552
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_rnn_vanilla_dropout --float --batch-size 17 --seq-len 51 --vector-len 31 --hidden-size 127 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 1 --bias-mode 1 --dir-mode 0 --rnn-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
FAILED: 0.00610061
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 -m relu -k 51 -H 127 -W 127 -l 3 -F 0 -r 0 -b 1 -p 1 -U 1
Forward Train RNN vanilla:
Hidden state tensor failed verification.
Max diff: 0.0544878
Mismatch at 15: 0.0989682 != 0.0989682
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_rnn_vanilla_dropout --float --batch-size 17 --seq-len 51 --vector-len 31 --hidden-size 127 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 1 --bias-mode 1 --dir-mode 0 --rnn-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
FAILED: 0.00585688
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 -m relu -k 51 -H 127 -W 127 -l 3 -F 0 -r 0 -b 1 -p 1 -U 1
Forward Train RNN vanilla:
Weight tensor failed verification.
Max diff: 0.441525
Mismatch at 2: 0.031029 != 0.031029
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_rnn_vanilla_dropout --float --batch-size 17 --seq-len 51 --vector-len 31 --hidden-size 127 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 1 --bias-mode 1 --dir-mode 0 --rnn-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
FAILED: 0.00518946
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 -m relu -k 51 -H 127 -W 127 -l 3 -F 0 -r 0 -b 1 -p 1 -U 1
Backward Data RNN vanilla:
Output dx failed verification.
Max diff: 0.117308
Mismatch at 2: 0.00535342 != 0.00535343
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_rnn_vanilla_dropout --float --batch-size 17 --seq-len 51 --vector-len 31 --hidden-size 127 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 1 --bias-mode 1 --dir-mode 0 --rnn-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
FAILED: 0.00415193
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 -m relu -k 51 -H 127 -W 127 -l 3 -F 0 -r 0 -b 1 -p 1 -U 1
Backward Data RNN vanilla:
Hidden state dhx tensor failed verification.
Max diff: 0.00748828
Mismatch at 0: -0.00334076 != -0.00334076
/home/oleid/.cache/rua/build/miopen-hip/src/build/bin/test_rnn_vanilla_dropout --float --batch-size 17 --seq-len 51 --vector-len 31 --hidden-size 127 --num-layers 3 --flat-batch-fill 0 --use-dropout 1 --in-mode 1 --bias-mode 1 --dir-mode 0 --rnn-mode 0 --batch-seq 17 17 16 16 16 16 15 15 14 13 12 11 10 10 9 8 8 8 7 6 6 5 5 4 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
FAILED: 0.00436673
Iteration: 0
./bin/MIOpenDriver rnn -n 17,17,16,16,16,16,15,15,14,13,12,11,10,10,9,8,8,8,7,6,6,5,5,4,4,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 -m relu -k 51 -H 127 -W 127 -l 3 -F 0 -r 0 -b 1 -p 1 -U 1
Backward Data RNN vanilla:
Reserved space tensor failed verification.
Max diff: 0.117308
Mismatch at 2: 0.00535342 != 0.00535343
Start 39: test_sequences
41/55 Test #39: test_sequences ................... Passed 0.01 sec
Start 41: test_solver
42/55 Test #41: test_solver ...................... Passed 0.40 sec
Start 42: test_sqlite_perfdb
43/55 Test #42: test_sqlite_perfdb ............... Passed 3.65 sec
Start 43: test_tensor_cast
44/55 Test #43: test_tensor_cast ................. Passed 0.90 sec
Start 44: test_tensor_copy
45/55 Test #44: test_tensor_copy ................. Passed 0.92 sec
Start 45: test_tensor_ops
46/55 Test #45: test_tensor_ops .................. Passed 0.85 sec
Start 46: test_tensor_scale
47/55 Test #46: test_tensor_scale ................ Passed 0.86 sec
Start 47: test_tensor_set
48/55 Test #47: test_tensor_set .................. Passed 0.88 sec
Start 48: test_tensor_test
49/55 Test #48: test_tensor_test ................. Passed 0.06 sec
Start 49: test_tensor_trans
50/55 Test #49: test_tensor_trans ................ Passed 1.23 sec
Start 50: test_tensor_transform
51/55 Test #50: test_tensor_transform ............ Passed 0.99 sec
Start 51: test_tensor_vec
52/55 Test #51: test_tensor_vec .................. Passed 0.39 sec
Start 52: test_test_errors
53/55 Test #52: test_test_errors ................. Passed 0.07 sec
Start 53: test_type_name
54/55 Test #53: test_type_name ................... Passed 0.01 sec
Start 54: test_w_supertensor
55/55 Test #54: test_w_supertensor ............... Passed 1.22 sec
89% tests passed, 6 tests failed out of 55
Total Test time (real) = 228.81 sec
The following tests FAILED:
18 - test_find_db (Failed)
20 - test_gru (Failed)
21 - test_gru_dropout (Failed)
29 - test_lstm_dropout (Failed)
30 - test_main (Failed)
38 - test_rnn_vanilla_dropout (Failed)
Errors while running CTest
make[3]: *** [test/CMakeFiles/check.dir/build.make:76: test/CMakeFiles/check] Fehler 8
make[2]: *** [CMakeFiles/Makefile2:8075: test/CMakeFiles/check.dir/all] Fehler 2
make[1]: *** [CMakeFiles/Makefile2:8082: test/CMakeFiles/check.dir/rule] Fehler 2
make: *** [Makefile:3622: check] Fehler 2
Found it, it would seem miopen was built without support for clang-offload-bundler
, c.f. https://github.com/rocm-arch/rocm-arch/pull/417
System information
Describe the current behavior
The minimal example dies with:
Please note, I added a bunch of extra debug output (like a backtrace) before I realized (by digging through the code) that there is
AMD_LOG_LEVEL
environment variable.As you can see, the call
in
DynCO::loadCodeObjectData
inhip_code_object.cpp
fails.It seems to work, for some reason, if I do 2d convolutions.But as I'm trying to train a 1d model, this doesn't really help:)Big unknown for me -- hence this issue: What does it exactly mean, that runtime metadata section not present in ELF program binary. Why could this be the case and how do I get it there?
Describe the expected behavior
Training should work.
Standalone code to reproduce the issue