apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

disabled 3 tutorials tests #16181

Open Vikas-kum opened 5 years ago

Vikas-kum commented 5 years ago

Nighltly tests are failing due to some tutorial tests. We fixed some. 3 tests were disabled form this file : tests/tutorials/test_tutorials.py test_gluon_performance test_python_profiler test_mkldnn_quantization

We need to uncomment the test after fixes are done in tutorials.

https://github.com/apache/incubator-mxnet/pull/16179/files

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended label(s): Test

Vikas-kum commented 5 years ago

@mxnet-label-bot [Test]

sojiadeshina commented 5 years ago

profiler fix should be in this pr: https://github.com/apache/incubator-mxnet/pull/16160

But yeah we can disable until after the fixes for mkldnn_quantization and gluon_performance are done. I think we may have to whitelist mkldnn_quantization from the test suite actually in the long run.

Vikas-kum commented 5 years ago

@sad- Thanks. I tried that but doesn't look like profiler tests were passing-

Looks like there is more to fix here - New error that came here was -

MXNetError: [19:29:32] /work/mxnet/3rdparty/dmlc-core/include/dmlc/thread_group.h:227: Check failed: auto_remove_ == false (1 vs. 0) : 

Stack trace:

  [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7effabb7aed2]

  [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::ThreadGroup::Thread::joinable() const+0xf4) [0x7effae507734]

  [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetContinuousProfileDump(bool, float)+0x108) [0x7effae504d28]

  [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::profiler::Profiler::SetConfig(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, float, bool)+0x95) [0x7effae505b95]

  [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(MXSetProcessProfilerConfig+0x3d6) [0x7effaecb80a6]

  [bt] (5) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7efffb8bae20]

  [bt] (6) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(ffi_call+0x2eb) [0x7efffb8ba88b]

  [bt] (7) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(_ctypes_callproc+0x49a) [0x7efffb8b501a]

  [bt] (8) /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) [0x7efffb8a8fcb]

Logs for reference - http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/tutorials_nighly_fix/10/pipeline

Can you please try to provide fixes for 3 tests and then we can enable the tests in nightly. MKLDNN is mostly using wrong binary. (Currently using GPU binary without mkldnn libraries.)