dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.42k stars 3k forks source link

MXNetError: Check failed: delay_alloc: #2937

Open blackboxo opened 3 years ago

blackboxo commented 3 years ago

🐛 Bug

Snipaste_2021-05-24_15-06-53

Environment

BarclayII commented 3 years ago

Seems that you are running the SAGEMaker fraud detection example? @sojiadeshina

blackboxo commented 3 years ago

Seems that you are running the SAGEMaker fraud detection example? @sojiadeshina

yes

blackboxo commented 3 years ago

any progress? 😂

sojiadeshina commented 3 years ago

Hi, seems you're trying with MXNet 1.7.0 any chance which the original code wasn't tested with. Any chance of using MXNet 1.6.0?

blackboxo commented 3 years ago

Hi, seems you're trying with MXNet 1.7.0 any chance which the original code wasn't tested with. Any chance of using MXNet 1.6.0?

Thanks, but another error happened. "Floating point exception", seems heppened in line 47 " pred = model(node_flow, features[batch_nids.as_in_context(ctx)])" in train_dgl_mxnet_entry_point.py

sojiadeshina commented 3 years ago

Hi, seems you're trying with MXNet 1.7.0 any chance which the original code wasn't tested with. Any chance of using MXNet 1.6.0?

Thanks, but another error happened. "Floating point exception", seems heppened in line 47 " pred = model(node_flow, features[batch_nids.as_in_context(ctx)])" in train_dgl_mxnet_entry_point.py

Can you post the full stack trace. I assume you're running this on your own graph

blackboxo commented 3 years ago

Hi, seems you're trying with MXNet 1.7.0 any chance which the original code wasn't tested with. Any chance of using MXNet 1.6.0?

Thanks, but another error happened. "Floating point exception", seems heppened in line 47 " pred = model(node_flow, features[batch_nids.as_in_context(ctx)])" in train_dgl_mxnet_entry_point.py

Can you post the full stack trace. I assume you're running this on your own graph

Nope, I run on the same IEEE-CIS fraud dataset. It did not print full stack trace, juts aborted and say "Floating point exception". I think it is a bug in nd.LeakyReLU(h) when size of h is 0