dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

flaky test_lamb.py::test_lamb_for_fashion_mnist #972

Open szha opened 5 years ago

szha commented 5 years ago

Description

http://ci.mxnet.io/blue/rest/organizations/jenkins/pipelines/GluonNLP-py3-master-cpu-unittest/branches/PR-966/runs/4/nodes/87/steps/212/log/?start=0


[2019-10-11T21:49:14.639Z] _________________________ test_lamb_for_fashion_mnist __________________________
[2019-10-11T21:49:14.639Z] [gw1] linux -- Python 3.6.7 /var/lib/jenkins/workspace/gluon-nlp-cpu-py3/conda/cpu/py3-master/bin/python
[2019-10-11T21:49:14.639Z] 
[2019-10-11T21:49:14.639Z]     def test_lamb_for_fashion_mnist():
[2019-10-11T21:49:14.639Z]         mnist_train = gdata.vision.FashionMNIST(train=True)
[2019-10-11T21:49:14.639Z]         mnist_test = gdata.vision.FashionMNIST(train=False)
[2019-10-11T21:49:14.639Z]     
[2019-10-11T21:49:14.639Z]         batch_size = 512
[2019-10-11T21:49:14.639Z]         transformer = gdata.vision.transforms.ToTensor()
[2019-10-11T21:49:14.639Z]         if sys.platform.startswith('win'):
[2019-10-11T21:49:14.639Z]             num_workers = 0  # 0 disables multi-processing.
[2019-10-11T21:49:14.639Z]         else:
[2019-10-11T21:49:14.639Z]             num_workers = 4
[2019-10-11T21:49:14.639Z]     
[2019-10-11T21:49:14.639Z]         train_iter = gdata.DataLoader(mnist_train.transform_first(transformer),
[2019-10-11T21:49:14.639Z]                                       batch_size, shuffle=True,
[2019-10-11T21:49:14.639Z] >                                     num_workers=num_workers)
[2019-10-11T21:49:14.639Z] 
[2019-10-11T21:49:14.639Z] tests/unittest/test_lamb.py:23: 
[2019-10-11T21:49:14.639Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2019-10-11T21:49:14.639Z] conda/cpu/py3-master/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py:607: in __init__
[2019-10-11T21:49:14.639Z]     original_sigint_handler = signal.signal(signal.SIGINT, signal.SIG_IGN)
[2019-10-11T21:49:14.639Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2019-10-11T21:49:14.639Z] 
[2019-10-11T21:49:14.639Z] signalnum = <Signals.SIGINT: 2>, handler = <Handlers.SIG_IGN: 1>
[2019-10-11T21:49:14.639Z] 
[2019-10-11T21:49:14.639Z]     @_wraps(_signal.signal)
[2019-10-11T21:49:14.639Z]     def signal(signalnum, handler):
[2019-10-11T21:49:14.639Z] >       handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
[2019-10-11T21:49:14.639Z] E       ValueError: signal only works in main thread
[2019-10-11T21:49:14.639Z] 
[2019-10-11T21:49:14.639Z] conda/cpu/py3-master/lib/python3.6/signal.py:47: ValueError
[2019-10-11T21:49:14.639Z] ----------------------------- Captured stdout call -----------------------------
[2019-10-11T21:49:14.639Z] Downloading tests/data/datasets/fashion-mnist/train-images-idx3-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-images-idx3-ubyte.gz...
[2019-10-11T21:49:14.639Z] Downloading tests/data/datasets/fashion-mnist/train-labels-idx1-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz...
[2019-10-11T21:49:14.639Z] Downloading tests/data/datasets/fashion-mnist/t10k-images-idx3-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/t10k-images-idx3-ubyte.gz...
[2019-10-11T21:49:14.639Z] Downloading tests/data/datasets/fashion-mnist/t10k-labels-idx1-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/t10k-labels-idx1-ubyte.gz...
leezu commented 5 years ago

@zhreshold seems https://github.com/apache/incubator-mxnet/pull/16114 prevents using gluon dataloader outside of main thraed? (Not sure why we use it outside ofmain thread on our CI though - see error signal only works in main thread above).