apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

[Clojure] (ndarray/* ndarray num) resulting in zero ndarray #15463

Open deem0n opened 5 years ago

deem0n commented 5 years ago

Description

ndarray/* results in zero ndarray when second arg is float in range (0,1). For example: 0.3 or 0.9

Environment info

----------Python Info----------
('Version      :', '2.7.16')
('Compiler     :', 'GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)')
('Build        :', ('default', 'Mar  4 2019 09:02:22'))
('Arch         :', ('64bit', ''))
------------Pip Info-----------
('Version      :', '19.0.3')
('Directory    :', '/usr/local/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform     :', 'Darwin-18.5.0-x86_64-i386-64bit')
('system       :', 'Darwin')
('node         :', 'DmitrijeevsMBP3.guest')
('release      :', '18.5.0')
('version      :', 'Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'i386')
machdep.cpu.brand_string: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0826 sec, LOAD: 1.3385 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0584 sec, LOAD: 2.5662 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0633 sec, LOAD: 0.7552 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0978 sec, LOAD: 0.5505 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1293 sec, LOAD: 1.4447 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1585 sec, LOAD: 1.0627 sec.

Package used (Python/R/Scala/Julia): Clojure

Clojure 1.10.1 java version "1.8.0_211" Java(TM) SE Runtime Environment (build 1.8.0_211-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

MXNet: org.apache.mxnet.contrib.clojure/clojure-mxnet-osx-cpu "1.4.1"

Minimum reproducible example

Please clone mul-bug branch from github: https://github.com/deem0n/deep-book-clojure/tree/mul-bug

Download MNIST files and run lein run

Check update-mini-batch function in core.clj This is line where weird thing happens!

Steps to reproduce

  1. git clone https://github.com/deem0n/deep-book-clojure.git
  2. cd deep-book-clojure
  3. git checkout mul-bug
  4. utils/get_mnist_data.sh
  5. lein run

Example output

INFO  MXNetJVM: Try loading mxnet-scala from native path.
WARN  MXNetJVM: MXNet Scala native library not found in path. Copying native library from the archive. Consider installing the library somewhere in the path (for Windows: PATH, for Linux: LD_LIBRARY_PATH), or specifying by Java cmd option -Djava.library.path=[lib path].
WARN  MXNetJVM: LD_LIBRARY_PATH=null
WARN  MXNetJVM: java.library.path=/Users/deem0n/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
INFO  org.apache.mxnet.util.NativeLibraryLoader: Replaced .dylib with .jnilib
Just started training...
[11:45:30] src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(10000,784)
[11:45:30] src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(5000,784)
WARN  org.apache.mxnet.WarnIfNotDisposed: LEAK: [one-time warning] An instance of org.apache.mxnet.NDArray was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing
WARN  org.apache.mxnet.WarnIfNotDisposed: LEAK: [one-time warning] An instance of org.apache.mxnet.io.MXDataIter was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing
Mini batches:  5000
.INTERMEDIATE 0.23711163 * 0.3 = 0.0
INTERMEDIATE 0.6643658 * 0.3 = 0.0
INTERMEDIATE 0.3021536 * 0.3 = -0.0
INTERMEDIATE 0.6497877 * 0.3 = 0.0
.INTERMEDIATE 0.20036256 * 0.3 = 0.0
INTERMEDIATE 0.80026126 * 0.3 = 0.0
INTERMEDIATE 0.20262644 * 0.3 = -0.0
INTERMEDIATE 0.847942 * 0.3 = 0.0
.INTERMEDIATE 0.21089189 * 0.3 = 0.0
INTERMEDIATE 0.7596359 * 0.3 = 0.0
INTERMEDIATE 0.21294513 * 0.3 = -0.0
INTERMEDIATE 0.8713788 * 0.3 = 0.0
.INTERMEDIATE 0.21063924 * 0.3 = 0.0
INTERMEDIATE 0.7280767 * 0.3 = 0.0
INTERMEDIATE 0.21684752 * 0.3 = -0.0
INTERMEDIATE 0.7300375 * 0.3 = 0.0
.INTERMEDIATE 0.18669352 * 0.3 = 0.0
INTERMEDIATE 1.0009404 * 0.3 = 0.0
INTERMEDIATE 0.1816474 * 0.3 = -0.0
INTERMEDIATE 1.0679908 * 0.3 = 0.0
.INTERMEDIATE 0.17094775 * 0.3 = 0.0
INTERMEDIATE 0.83719015 * 0.3 = 0.0
INTERMEDIATE 0.18325534 * 0.3 = -0.0
INTERMEDIATE 0.85101455 * 0.3 = 0.0
.INTERMEDIATE 0.27498227 * 0.3 = 0.0
INTERMEDIATE 0.8912646 * 0.3 = 0.0
INTERMEDIATE 0.30358908 * 0.3 = -0.0
INTERMEDIATE 0.9557106 * 0.3 = 0.0
.INTERMEDIATE 0.2489399 * 0.3 = 0.0
INTERMEDIATE 0.8362016 * 0.3 = 0.0
INTERMEDIATE 0.2548548 * 0.3 = -0.0
INTERMEDIATE 0.83855635 * 0.3 = 0.0
.INTERMEDIATE 0.24952362 * 0.3 = 0.0
INTERMEDIATE 0.7708196 * 0.3 = 0.0
INTERMEDIATE 0.25566715 * 0.3 = -0.0
INTERMEDIATE 0.77132565 * 0.3 = 0.0
.INTERMEDIATE 0.32142815 * 0.3 = 0.0
INTERMEDIATE 0.95997936 * 0.3 = 0.0
INTERMEDIATE 0.32541075 * 0.3 = -0.0
INTERMEDIATE 0.9632486 * 0.3 = 0.0

Epoch 0: 107 / 1000
INFO  org.apache.mxnet.util.NativeLibraryLoader: Deleting /var/folders/jn/q10st3jd7ngb24nckb0cv49r0000gp/T/mxnet3086131443089653112/mxnet-scala
INFO  org.apache.mxnet.util.NativeLibraryLoader: Deleting /var/folders/jn/q10st3jd7ngb24nckb0cv49r0000gp/T/mxnet3086131443089653112/libmxnet.so
INFO  org.apache.mxnet.util.NativeLibraryLoader: Deleting /var/folders/jn/q10st3jd7ngb24nckb0cv49r0000gp/T/mxnet3086131443089653112

Notes

Strangely enough when I run code:

(def a (ndarray/* (ndarray/array [0.01 0.02 0.03 0.045 0.05 0.06][2 3]) 0.3))
(ndarray/->vec (ndarray/max a))

it runs correctly in REPL. But even constant ndarray get zero result, if I put it into update_mini_batch

         calc (fn [w nw] (let [ ;m (ndarray/div nw (/ 1.0 eta_by_minibatch_cnt))
                                m  (ndarray/* (ndarray/array [0.01 0.02 0.03 0.045 0.05 0.06][2 3]) 0.3); this one will result in zero matrix !!!
                                _ (println "INTERMEDIATE"  (ndarray/to-scalar (ndarray/max nw)) "*" eta_by_minibatch_cnt "=" (ndarray/to-scalar (ndarray/max m)))
res (ndarray/- w m)]

Even this code will result in zero ndarray !

What have you tried to solve it?

  1. I am using ndarray/div as workaround
mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Bug

gigasquid commented 5 years ago

Thanks @deem0n for reporting the issue and also for providing a nice repro case 💯

lanking520 commented 5 years ago

@gigasquid Can you check which function call to the Scala portion. It might be a scala problem

gigasquid commented 5 years ago

Sorry it took me so long to get back to this.

First, thanks so much for the detailed steps and github/branch to reproduce 💯

Now for the weird news - I did the exact steps outlined to reproduce and didn't get 0 😕

Here is my output (tested on my mac MacBook Pro (Retina, 15-inch, Late 2013) and also MacBook Pro (15-inch 2018)

Just started training...
[10:52:29] src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(10000,784)
[10:52:30] src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(5000,784)
WARN  org.apache.mxnet.WarnIfNotDisposed: LEAK: [one-time warning] An instance of org.apache.mxnet.NDArray was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing
WARN  org.apache.mxnet.WarnIfNotDisposed: LEAK: [one-time warning] An instance of org.apache.mxnet.io.MXDataIter was not disposed. Set property mxnet.traceLeakedObjects to true to enable tracing
Mini batches:  5000
.INTERMEDIATE 0.16412103 * 0.3 = 0.049236313
INTERMEDIATE 1.065679 * 0.3 = 0.3197037
INTERMEDIATE 0.1732387 * 0.3 = 0.05197161
INTERMEDIATE 1.069547 * 0.3 = 0.32086414
.INTERMEDIATE 0.25935003 * 0.3 = 0.07780501
INTERMEDIATE 0.75463897 * 0.3 = 0.2263917
INTERMEDIATE 0.2725639 * 0.3 = 0.081769176
INTERMEDIATE 0.84347457 * 0.3 = 0.25304237
.INTERMEDIATE 0.13788064 * 0.3 = 0.041364193
INTERMEDIATE 0.42585567 * 0.3 = 0.1277567
INTERMEDIATE 0.14192675 * 0.3 = 0.042578027
INTERMEDIATE 0.42733985 * 0.3 = 0.12820196
.INTERMEDIATE 0.11281382 * 0.3 = 0.033844147
INTERMEDIATE 0.7060675 * 0.3 = 0.21182026
INTERMEDIATE 0.10053872 * 0.3 = 0.030161617
INTERMEDIATE 0.72622657 * 0.3 = 0.21786799
.INTERMEDIATE 0.13522136 * 0.3 = 0.04056641
INTERMEDIATE 0.5249011 * 0.3 = 0.15747033
INTERMEDIATE 0.13516489 * 0.3 = 0.04054947
INTERMEDIATE 0.52688986 * 0.3 = 0.15806696
.INTERMEDIATE 0.14416938 * 0.3 = 0.043250814
INTERMEDIATE 0.31797385 * 0.3 = 0.09539216
INTERMEDIATE 0.14395727 * 0.3 = 0.043187182
INTERMEDIATE 0.31797716 * 0.3 = 0.09539315
.INTERMEDIATE 0.062895596 * 0.3 = 0.01886868
INTERMEDIATE 0.18372236 * 0.3 = 0.05511671
INTERMEDIATE 0.063012555 * 0.3 = 0.018903768
INTERMEDIATE 0.18413597 * 0.3 = 0.055240795
.INTERMEDIATE 0.052791305 * 0.3 = 0.015837392
INTERMEDIATE 0.54763573 * 0.3 = 0.16429073
INTERMEDIATE 0.0560796 * 0.3 = 0.01682388
INTERMEDIATE 0.5777395 * 0.3 = 0.17332184
.INTERMEDIATE 0.07229043 * 0.3 = 0.02168713
INTERMEDIATE 0.19547275 * 0.3 = 0.058641825
INTERMEDIATE 0.07235214 * 0.3 = 0.021705642
INTERMEDIATE 0.19866998 * 0.3 = 0.059600998
.INTERMEDIATE 0.07617488 * 0.3 = 0.022852464
INTERMEDIATE 0.07070907 * 0.3 = 0.021212723
INTERMEDIATE 0.07607305 * 0.3 = 0.022821916
INTERMEDIATE 0.053693853 * 0.3 = 0.016108157

Epoch 0: 97 / 1000

Very weird! Wondering what Mac model you have? Can you share your specs, OS ?

gigasquid commented 5 years ago

Also - wondering if it still happens if you upgrade to the latest 1.5.1 jar? https://search.maven.org/search?q=clojure-mxnet