apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Gluon RNN test randomly crashes #18225

Open szha opened 4 years ago

szha commented 4 years ago

Description

test_gluon_gpu.py::test_rnn_forward_backward

Occurrences

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-18146/runs/48/nodes/351/steps/415/log/?start=0

leezu commented 4 years ago

[2020-05-13T06:46:07.563Z] worker 'gw3' crashed while running 'tests/python/gpu/test_gluon_gpu.py::test_rnn_forward_backward[False-True-True-layer0-True]'

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/master/runs/1959/nodes/387/steps/678/log/?start=0

leezu commented 4 years ago

This fails with a Segmentation fault