apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Unexpected behaviour of `mxnet.np.random.multivariate_normal` #21095

Closed dougiesquire closed 2 years ago

dougiesquire commented 2 years ago

Description

I'm quite possibly misunderstanding something, but mxnet.np.random.multivariate_normal seems to produce unexpected values relative to, for example, numpy.random.multivariate_normal for some covariance matrices.

To Reproduce

The following code demonstrates the unexpected behaviour by comparing distributions created using the numpy, jax and mxnet random.multivariate_normal functions:

import numpy as np
from mxnet import np as mxnp
import jax
import matplotlib.pyplot as plt

mean = np.array([0., 0.])
cov = np.array([[0.2, 0.2],[0.2, 20]])
mean_mxnp = mxnp.array(mean)
cov_mxnp = mxnp.array(cov)

mvn_np = np.random.multivariate_normal(mean, cov, size=10000)
mvn_mxnp = mxnp.random.multivariate_normal(mean_mxnp, cov_mxnp, size=10000).asnumpy()
mvn_jax = np.array(jax.random.multivariate_normal(jax.random.PRNGKey(0), mean, cov, shape=(10000,)))

fig = plt.figure(figsize=(14, 4))
ax = fig.subplots(1, len(mean), sharey=True)
for idx in range(len(mean)):
    ax[idx].hist(mvn_np[:,idx], bins=100, alpha=0.5, label="numpy.random")
    ax[idx].hist(mvn_mxnp[:,idx], bins=100, alpha=0.5, label="mxnet.np.random")
    ax[idx].hist(mvn_jax[:,idx], bins=100, alpha=0.5, label="jax.random")
    ax[idx].set_xlabel(f"var{idx}")
    ax[idx].set_ylabel(f"Count")
    ax[idx].legend()
    ax[idx].grid()

Which generates:

Screen Shot 2022-07-15 at 11 05 00 am

You can see that the var0 distribution from mxnet.random.multivariate_normal (left panel) is incorrect. This seems to occur when cov[0,0] is small, though I haven't tested this very thoroughly.

Environment

Note, I've replaced some directory path details below with "..."

Environment Information ``` ----------Python Info---------- Version : 3.10.4 Compiler : GCC 10.3.0 Build : ('main', 'Mar 24 2022 17:38:57') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 22.1.2 Directory : /.../pip ----------MXNet Info----------- Version : 1.9.1 Directory : /.../mxnet Commit hash file "/.../mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source. Library : ['/.../mxnet/libmxnet.so'] Build features: ✖ CUDA ✖ CUDNN ✖ NCCL ✖ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✖ CPU_SSE4_1 ✖ CPU_SSE4_2 ✖ CPU_SSE4A ✖ CPU_AVX ✖ CPU_AVX2 ✔ OPENMP ✖ SSE ✖ F16C ✖ JEMALLOC ✔ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✔ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ✖ TVM_OP ----------System Info---------- Platform : Linux-4.18.0-372.13.1.el8.nci.x86_64-x86_64-with-glibc2.28 system : Linux node : gadi-login-08.gadi.nci.org.au release : 4.18.0-372.13.1.el8.nci.x86_64 version : #1 SMP Mon Jul 4 08:46:44 AEST 2022 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 1 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz Stepping: 7 CPU MHz: 2900.000 CPU max MHz: 3900.0000 CPU min MHz: 1200.0000 BogoMIPS: 5800.00 L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-3,7,8,12-14,18-20 NUMA node1 CPU(s): 4-6,9-11,15-17,21-23 NUMA node2 CPU(s): 24-27,31,32,36-38,42-44 NUMA node3 CPU(s): 28-30,33-35,39-41,45-47 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0250 sec, LOAD: 0.6436 sec. Error open Gluon Tutorial(en): http://gluon.mxnet.io, HTTP Error 404: Not Found, DNS finished in 0.407116174697876 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, , DNS finished in 1.2768752574920654 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1059 sec, LOAD: 0.7920 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0096 sec, LOAD: 0.9369 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.01588129997253418 sec. ----------Environment---------- CC="icc" CXX="icpc" OMP_NUM_THREADS="1" KMP_DUPLICATE_LIB_OK="True" KMP_INIT_AT_FORK="FALSE" ```
github-actions[bot] commented 2 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

hankaj commented 2 years ago

Hi @dougiesquire, thanks for submitting your issue. I managed to get the reproduction and find the root cause. Fix is here: #21105.

bgawrych commented 2 years ago

closing - in case of any doubts feel free to reopen