apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

mx.nd.argsort() output is different on CPU and GPU #19027

Closed karan6181 closed 4 years ago

karan6181 commented 4 years ago

Description

The mx.nd.argsort() output is different on CPU and GPU for certain edge cases

Script

import mxnet as mx
import numpy as np
def get_array(row, col):
    return [[float(1.0) if k<=2 else float(0.0) for k in range(col)] for _ in range(row)]

x = get_array(4, 4)
y = get_array(4, 4)
a_cpu = mx.nd.array(x, ctx=mx.cpu())
a_gpu = mx.nd.array(y, ctx=mx.gpu())
print("======================= Input ============================")
print(a_cpu)
print("======================= MXNet Ascending order ============================")
print("CPU: ", a_cpu.argsort(axis=1, is_ascend=True))
print("GPU: ", a_gpu.argsort(axis=1, is_ascend=True))
print("======================= MXNet Descending order ============================")
print("CPU: ", a_cpu.argsort(axis=1, is_ascend=False))
print("GPU: ", a_gpu.argsort(axis=1, is_ascend=False))
print("======================= Numpy ============================")
a_cpu_np = a_cpu.asnumpy()
print("CPU: \n", np.argsort(a_cpu.asnumpy(), axis=1))

Output

======================= Input ============================

[[1. 1. 1. 0.]
 [1. 1. 1. 0.]
 [1. 1. 1. 0.]
 [1. 1. 1. 0.]]
<NDArray 4x4 @cpu(0)>
======================= MXNet Ascending order ============================
CPU:
[[3. 0. 1. 2.]
 [3. 0. 1. 2.]
 [3. 0. 1. 2.]
 [3. 0. 1. 2.]]
<NDArray 4x4 @cpu(0)>
GPU:
[[3. 0. 2. 1.]
 [3. 0. 2. 1.]
 [3. 0. 2. 1.]
 [3. 0. 2. 1.]]
<NDArray 4x4 @gpu(0)>
======================= MXNet Descending order ============================
CPU:
[[0. 1. 2. 3.]
 [0. 1. 2. 3.]
 [0. 1. 2. 3.]
 [0. 1. 2. 3.]]
<NDArray 4x4 @cpu(0)>
GPU:
[[0. 2. 1. 3.]
 [0. 2. 1. 3.]
 [0. 2. 1. 3.]
 [0. 2. 1. 3.]]
<NDArray 4x4 @gpu(0)>
======================= Numpy ============================
CPU:
 [[3 0 1 2]
 [3 0 1 2]
 [3 0 1 2]
 [3 0 1 2]]

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

----------Python Info----------
Version      : 3.8.3
Compiler     : GCC 7.3.0
Build        : ('default', 'May 19 2020 18:47:26')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 20.0.2
Directory    : /shared/mxnet_env/lib/python3.8/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /shared/mxnet_env/lib/python3.8/site-packages/mxnet
Num GPUs     : 8
Commit Hash   : 6eec9da55c5096079355d1f1a5fa58dcf35d6752
----------System Info----------
Platform     : Linux-4.15.0-1060-aws-x86_64-with-glibc2.10
system       : Linux
node         : ip-192-168-70-218
release      : 4.15.0-1060-aws
version      : #62-Ubuntu SMP Tue Feb 11 21:23:22 UTC 2020
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping:            4
CPU MHz:             1201.597
BogoMIPS:            4999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            33792K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0040 sec, LOAD: 0.6617 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0009 sec, LOAD: 0.3580 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0347 sec, LOAD: 0.1685 sec.
Timing for D2L: http://d2l.ai, DNS: 0.1150 sec, LOAD: 0.0382 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.1203 sec, LOAD: 0.2038 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0773 sec, LOAD: 0.3685 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0024 sec, LOAD: 0.0772 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.02059340476989746 sec.
szha commented 4 years ago

argsort does not guarantee to be stable.

sandeep-krishnamurthy commented 4 years ago

argsort does not guarantee to be stable.

@szha => Can you please elaborate what it means?

szha commented 4 years ago

Unstable sort means for equal values the output order doesn't preserve the input orders

marcoabreu commented 4 years ago

See the "Notes" section here: https://numpy.org/doc/stable/reference/generated/numpy.sort.html

sandeep-krishnamurthy commented 4 years ago

Thank you both.