Error on LibXSMM cpu kernels

VoVAllen commented 3 years ago

🐛 Bug

User met errors raised at https://github.com/dmlc/dgl/blob/983a4fdd1981a6eaa4a3343ec4116739e9f97dfa/src/array/cpu/spmm_blocking_libxsmm.h#L267

 "Failed to generate libxsmm kernel for the SpMM operation!"

We should not raised the error but fallback to the naive kernel.

User's CPU model: Xeon(R) CPU E5-2695 v2 @ 2.40GHz This is an old CPU produced in 2013, which might not be supported by LibXSMM now. https://ark.intel.com/content/www/us/en/ark/products/75281/intel-xeon-processor-e52695-v2-30m-cache-2-40-ghz.html

Possible Solution

Catch error at https://github.com/dmlc/dgl/blob/983a4fdd1981a6eaa4a3343ec4116739e9f97dfa/src/array/cpu/spmm.h#L144, and run the naive kernel if error detected

BarclayII commented 3 years ago

Reproduction:

Run in directory examples/pytorch/correct_and_smooth:

python main.py --dataset ogbn-products --model linear --dropout 0.5 --epochs 1000 --lr 0.1 --gpu -1
python main.py --dataset ogbn-products --model linear --pretrain --correction-alpha 1. --smoothing-alpha 0.9 --gpu -1

I wasn't able to reproduce it on my p2.8x (Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz) but it's a newer CPU.

@sanchit-misra Could you please confirm if @VoVAllen's hypothesis is indeed the case?

sanchit-misra commented 2 years ago

@BarclayII will take a look.

sanchit-misra commented 2 years ago

I was not able to reproduce this but I don't have access to such an old system. :-) According to the libxsmm developers (who are my colleagues), while libxsmm did not support any architectures without at least AVX2 support, it did not explicitly check whether the underlying architecture is supported. So, it would not throw an error if the architecture was not supported. So, not sure where this error came from.

Having said that, libxsmm now explicitly checks whether the architecture is supported. And if it is not supported, it returns a nullptr kernel. I check this and fall back to naive kernel.

sixtyfive commented 2 years ago

I'm hitting this on an older Xeon as well:

$ cat /proc/cpuinfo
...
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2470 v2 @ 2.40GHz
stepping        : 4
microcode       : 0x42d
cpu MHz         : 2399.897
cache size      : 25600 KB
...

There's a GeForce RTX 2060 present in the system. Running a Python 3.9 virtualenv with stable DGL installed with pip install dgl dglgo -f https://data.dgl.ai/wheels/repo.html all on Ubuntu 20.04.4 LTS Server. The backend is pytorch. Happy to provide more info if you tell me what you need...

(Edit: just tried on my gaming PC which has a Core i5-3570. Same error message. Unfortunately the above Xeon is the newest CPU I have access to anywhere, so it would be really cool if libxssm would be a little more accomodating to people who don't have the newest hardware. Linux distro there is Manjaro with all the latest updates applied, so quite a different beast from the work server's Ubuntu ... the GPU is only a GTX 960, though, but I believe this is about CPU, not GPU.)

Soothysay commented 2 years ago

Hi, I've got the same issue as well:

  File "HGCNN_2_caller.py", line 387, in <module>
    output = model(dynamic_graphs, timestamps)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/nfs/vinci.1/home/choudhuri/temporal-gcn/Graph_GCN_V2.py", line 170, in forward
    h_dict = self.new_layer_1_base(current_graph, train_embeds)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/nfs/vinci.1/home/choudhuri/temporal-gcn/Graph_GCN_V2.py", line 60, in forward
    G.multi_update_all(funcs, "stack")
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/heterograph.py", line 5023, in multi_update_all
    all_out[dtid].append(core.message_passing(g, mfunc, rfunc, afunc))
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/core.py", line 357, in message_passing
    ndata = invoke_gspmm(g, mfunc, rfunc)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/core.py", line 332, in invoke_gspmm
    z = op(graph, x)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/ops/spmm.py", line 189, in func
    return gspmm(g, 'copy_lhs', reduce_op, x, None)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/ops/spmm.py", line 75, in gspmm
    ret = gspmm_internal(g._graph, op,
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/backend/pytorch/sparse.py", line 757, in gspmm
    return GSpMM.apply(gidx, op, reduce_op, lhs_data, rhs_data)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 118, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/backend/pytorch/sparse.py", line 126, in forward
    out, (argX, argY) = _gspmm(gidx, op, reduce_op, X, Y)
  File "/home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/sparse.py", line 228, in _gspmm
    _CAPI_DGLKernelSpMM(gidx, op, reduce_op,
  File "dgl/_ffi/_cython/./function.pxi", line 293, in dgl._ffi._cy3.core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 239, in dgl._ffi._cy3.core.FuncCall
dgl._ffi.base.DGLError: [13:06:10] /opt/dgl/src/array/cpu/./spmm_blocking_libxsmm.h:267: Failed to generate libxsmm kernel for the SpMM operation!
Stack trace:
  [bt] (0) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7fcf9d6c72ef]
  [bt] (1) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(void dgl::aten::cpu::SpMMRedopCsrOpt<long, float, dgl::aten::cpu::op::CopyLhs<float>, dgl::aten::cpu::op::Add<float> >(dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray)+0x3d4) [0x7fcf9d90c304]
  [bt] (2) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(void dgl::aten::cpu::SpMMSumCsrLibxsmm<long, float, dgl::aten::cpu::op::CopyLhs<float> >(dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray)+0x73) [0x7fcf9d90c3b3]
  [bt] (3) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(void dgl::aten::cpu::SpMMSumCsr<long, float, dgl::aten::cpu::op::CopyLhs<float> >(dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray)+0x12f) [0x7fcf9d9279bf]
  [bt] (4) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(void dgl::aten::SpMMCsr<1, long, 32>(std::string const&, std::string const&, dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> >)+0xcd3) [0x7fcf9d93dd13]
  [bt] (5) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(dgl::aten::SpMM(std::string const&, std::string const&, std::shared_ptr<dgl::BaseHeteroGraph>, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> >)+0x13d5) [0x7fcf9d96ff65]
  [bt] (6) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(+0x4703e8) [0x7fcf9d9843e8]
  [bt] (7) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(+0x470981) [0x7fcf9d984981]
  [bt] (8) /home/choudhuri/anaconda3/lib/python3.8/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fcf9d9d62d8]

I am using a system with Intel(R) Xeon(R) CPU E5645 @ 2.40GHz - 1.77/2.40GHz.

I seem to be getting the issue while using multi_update_all.

benjianzou commented 2 years ago

Hello, I've got the same issue as well:

Traceback (most recent call last): File "/home/coder/project/project/GraphProject/zgraph-lite/test_gcn.py", line 25, in test_gcn_ogb emb, predicted, model_bs = model.train(g.ndata['feat'].to(device), File "/home/coder/project/project/GraphProject/zgraph-lite/zgraph/alg/embedding/gcn/gcn.py", line 68, in train loss = model.get_loss(blocks, feat_in, label_in) File "/home/coder/project/project/GraphProject/zgraph-lite/zgraph/alg/embedding/gcn/gcn.py", line 119, in get_loss logits = self.forward(blocks, feat_in) File "/home/coder/project/project/GraphProject/zgraph-lite/zgraph/alg/embedding/gcn/gcn.py", line 115, in forward h = layer(block, h) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/nn/pytorch/conv/graphconv.py", line 423, in forward graph.update_all(aggregate_fn, fn.sum(msg='m', out='h')) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/heterograph.py", line 4895, in update_all ndata = core.message_passing(g, message_func, reduce_func, apply_node_func) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/core.py", line 357, in message_passing ndata = invoke_gspmm(g, mfunc, rfunc) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/core.py", line 332, in invoke_gspmm z = op(graph, x) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/ops/spmm.py", line 189, in func return gspmm(g, 'copy_lhs', reduce_op, x, None) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/ops/spmm.py", line 75, in gspmm ret = gspmm_internal(g._graph, op, File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/backend/pytorch/sparse.py", line 724, in gspmm return GSpMM.apply(gidx, op, reduce_op, lhs_data, rhs_data) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 118, in decorate_fwd return fwd(args, **kwargs) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/backend/pytorch/sparse.py", line 106, in forward out, (argX, argY) = _gspmm(gidx, op, reduce_op, X, Y) File "/home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/sparse.py", line 228, in _gspmm _CAPI_DGLKernelSpMM(gidx, op, reduce_op, File "dgl/_ffi/_cython/./function.pxi", line 293, in dgl._ffi._cy3.core.FunctionBase.call File "dgl/_ffi/_cython/./function.pxi", line 239, in dgl._ffi._cy3.core.FuncCall dgl._ffi.base.DGLError: [16:15:23] /opt/dgl/src/array/cpu/./spmm_blocking_libxsmm.h:267: Failed to generate libxsmm kernel for the SpMM operation! Stack trace: [bt] (0) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f7e50ea5f5f] [bt] (1) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(void dgl::aten::cpu::SpMMRedopCsrOpt<long, float, dgl::aten::cpu::op::CopyLhs, dgl::aten::cpu::op::Add >(dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray)+0x3d4) [0x7f7e510f32c4] [bt] (2) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(void dgl::aten::cpu::SpMMSumCsrLibxsmm<long, float, dgl::aten::cpu::op::CopyLhs >(dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray)+0x73) [0x7f7e510f3373] [bt] (3) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(void dgl::aten::cpu::SpMMSumCsr<long, float, dgl::aten::cpu::op::CopyLhs >(dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray)+0x12f) [0x7f7e5110e97f] [bt] (4) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(void dgl::aten::SpMMCsr<1, long, 32>(std::string const&, std::string const&, dgl::BcastOff const&, dgl::aten::CSRMatrix const&, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator >)+0xcd3) [0x7f7e51124cd3] [bt] (5) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(dgl::aten::SpMM(std::string const&, std::string const&, std::shared_ptr, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator >)+0x244e) [0x7f7e51158f5e] [bt] (6) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(+0x6648a0) [0x7f7e511788a0] [bt] (7) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(+0x664eb1) [0x7f7e51178eb1] [bt] (8) /home/coder/bin/anaconda3/envs/test_dgl/lib/python3.9/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7f7e511d0b18]

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

sixtyfive commented 2 years ago

Here's some activity. Everything about what constitutes the issue has been said, hasn't it? How is closing it automatically by means of a bot going to accomplish anything other but the digital version of sweeping it under the rug?

peizhou001 commented 2 years ago

HI, @sixtyfive , there is already a PR for fixing this issue #4455

The reason of this error is Libxsmm is not supported in some old versions of CPU. As it is not easy to pre-check if the lib is supported in current CPU, we provide an API to disable it at runtime:

dgl.use_libxsmm(bool)

There is a prompt the first time failed using this lib, and you can use the API to disable it for next running.

peizhou001 commented 2 years ago

Feel free to reopen if any further questions.

rse-lbl commented 7 months ago

@peizhou001

Sorry for reopening this old thread, but I am not a programmer and can't figure out how to use this API call. For compatibility reasons I've install dgl 1.0.2.

Where, and how, do I use dgl.use_libxsmm(flag)?

Thanks.

dmlc / dgl

Error on LibXSMM cpu kernels #3459

🐛 Bug

Possible Solution