Closed lovemefan closed 1 month ago
Do the im2col
tests in test-backend-ops
pass?
Do the
im2col
tests intest-backend-ops
pass?
all passed with -DGGML_METAL=OFF
one failed with -DGGML_METAL=ON
96% tests passed, 1 tests failed out of 24
Total Test time (real) = 86.43 sec
The following tests FAILED:
16 - test-conv-transpose-1d (Subprocess aborted)
The test-conv-transpose-1d
binary fails on Mac because we don't have Metal implementation yet.
@JohannesGaessler was asking for the result from test-backend-ops
and they currently pass:
make -j && ./bin//test-backend-ops -o IM2COL
Backend 1/2 (CPU)
Skipping CPU backend
Backend 2/2 (Metal)
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Pro
ggml_metal_init: picking default device: Apple M1 Pro
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M1 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB
Backend name: Metal
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): OK
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): OK
1344/1344 tests passed
Backend Metal: OK
ggml_metal_free: deallocating
2/2 backends passed
OK
@lovemefan Try to add a test to test-backend-ops.cpp
that fails and work from there
The script can pass, but I can’t reproduce my bug in add a examples in test-backend-ops. I spent some time troubleshooting, but couldn’t find anything.
Testing 2 backends
Backend 1/2 (CPU)
Backend name: CPU
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[10,10,3,1],ne_kernel=[3,3,3,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): OK
IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): ggml_backend_register: registered backend CPU
ggml_backend_register: registered backend Metal
OK
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[3000,128,1,1],ne_kernel=[3,128,1280,1],s0=1,s1=0,p0=1,p1=0,d0=1,d1=0,is_2D=0): OK
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[20,1,2,1],ne_kernel=[11,1,2,1],s0=1,s1=0,p0=5,p1=0,d0=1,d1=0,is_2D=0): OK
1345/1345 tests passed
Backend CPU: OK
Backend 2/2 (Metal)
Skipping
2/2 backends passed
OK
in my code, I noticed the output tensor looks like random data.
node_56 = (f32) IM2COL(encoder.encoders0.0.self_attn.fsmn_block.weight (reshaped){11, 1, 512, 1}, attention_V (transposed) (cont) (reshaped) (cont){187, 1, 512, 1}}) = {11, 187, 512, 1}
[
[
[ 0.1077, 0.0867, 0.0815, ..., 0.0939, 0.0617, 0.0004],
[ 0.0509, -0.0198, 0.0252, ..., 0.0589, 0.0110, -0.0662],
[ -0.0691, -0.1348, -0.1385, ..., -0.1628, -0.1450, -0.1191],
...,
[ 0.0251, -0.0317, -0.0558, ..., -0.1676, -0.1620, -0.0997],
[ -0.1455, -0.0764, -0.1159, ..., -0.0249, 0.0146, 0.0687],
[ 0.1323, 0.1453, 0.1234, ..., 0.2963, 0.1976, 0.1235],
],
[
[ 0.3266, 0.3900, 0.3218, ..., 0.0291, 0.0054, -0.0033],
[ 0.0183, 0.0379, 0.1066, ..., 0.2731, 0.2905, 0.3599],
[ 0.4786, 0.4105, 0.4773, ..., 0.2667, 0.3345, 0.2860],
...,
[ 0.2771, 0.2001, 0.2363, ..., 0.1060, 0.1247, 0.0417],
[ 0.0584, 0.1006, 0.1617, ..., -0.0197, -0.0852, -0.0765],
[ -0.0800, -0.0915, -0.1155, ..., -0.2073, -0.2165, -0.1289],
],
[
[ -0.1064, -0.1542, -0.1333, ..., -0.0538, 0.0158, 0.0972],
[ 0.1904, 0.2528, 0.2121, ..., 0.4930, 0.4395, 0.2486],
[ -0.0395, 0.0909, 0.2529, ..., 0.0461, 0.0395, -0.0260],
...,
[ 0.0827, 0.0932, 0.1548, ..., 0.1713, 0.0767, 0.1357],
[ 0.1237, 0.1689, 0.2138, ..., 0.1194, 0.1599, 0.2138],
[ 0.4326, 0.3982, 0.2848, ..., 0.2005, 0.3298, 0.2315],
],
...,
[
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
],
[
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
],
[
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
],
]
sum = 10.121820
when set dst type GGML_TYPE_F16
, NaN appear.
ggml_debug: node_56 = (f16) IM2COL(encoder.encoders0.0.self_attn.fsmn_block.weight (reshaped){11, 1, 512, 1}, attention_V (transposed) (cont) (reshaped) (cont){187, 1, 512, 1}}) = {11, 187, 512, 1}
[
[
[ -0.0001, 1.4648, -0.0001, ..., -1530.0000, 1.5137, 57856.0000],
[ 1.5762, 39104.0000, 1.5830, ..., 1.3711, -40.0000, 0.7363],
[ -0.0138, 1.3281, 0.0001, ..., 0.0350, 1.4951, -1.8301],
...,
[ 0.0001, 1.5469, -6.4375, ..., -33.4688, 1.5557, 0.5938],
[ 0.5054, -0.0015, 1.5127, ..., 1.4238, -0.0002, 1.4258],
[ -5.2070, 1.5615, 8.8516, ..., 0.4473, -1.3477, 0.6865],
],
[
[ -1.4102, -2.2832, -1.4033, ..., -1.2461, 45632.0000, -1.5615],
[ -0.0000, -1.4502, 0.0019, ..., 16544.0000, -1.5928, 32320.0000],
[ -1.5186, 0.0350, -1.5273, ..., 1.3389, -3242.0000, 1.2646],
...,
[ -1.4404, 14.8672, -1.2959, ..., 1.1074, -0.0106, 1.3867],
[ nan, 1.5068, -49.8125, ..., 0.0016, -1.4277, -8.9766],
[ 1.4160, -0.0000, 1.6074, ..., 1.5723, -62.8750, 1.4961],
],
[
[ 0.4175, 1.6631, -0.2144, ..., -121.3750, 1.4053, -0.4783],
[ 1.5615, 0.0000, 1.5938, ..., 0.9614, 0.0028, -0.9175],
[ nan, 1.1455, 0.0018, ..., 7648.0000, -1.0283, -0.6484],
...,
[ -0.1702, -1.4219, 56.3750, ..., 0.0184, -1.4717, -9.1172],
[ -1.5010, 22800.0000, -1.5039, ..., -1.5908, -11.7031, -1.6396],
[ -0.1598, -1.4199, -0.0002, ..., 4010.0000, 1.6562, 1384.0000],
],
...,
[
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
],
[
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
],
[
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
],
]
sum = nan
what can I do to provide more information next?
This bug is the same as #991, but occurring with Metal
fixed in (llama/9943)
In the code of my custom model, during debugging, I noticed that the im2col operator produces different results on Metal compared to the CPU.
my device is M1 pro.
code at https://github.com/lovemefan/SenseVoice.cpp/blob/1dfa60459a0104a68066ddf7c275f4c0a33972a6/sense-voice/csrc/sense-voice-encoder.cc#L267-L269
here is my callback log of tensor, the im2col's name is node55
cpu.log metal.log
I would appreciate your help with this. Thank you!