Maratyszcza / NNPACK

Acceleration package for neural networks on multi-core CPUs
BSD 2-Clause "Simplified" License
1.68k stars 317 forks source link

NNPACK test failed on ovh virtual device #81

Closed edmBernard closed 7 years ago

edmBernard commented 7 years ago

I try to install nnpack on a ovh server. Convolution test fail:

[0/7] RUN convolution-output-smoketest
convolution-output-smoketest: /home/dev/lib/NNPACK/test/convolution-output/smoke.cc:303: int main(int, char**): Assertion `init_status == nnp_status_success' failed.
FAILED: convolution-output-smoketest 
/home/dev/lib/NNPACK/bin/convolution-output-smoketest 
ninja: build stopped: subcommand failed.

Device caracteristic should work there is avx2 available. here proc stats :

/proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel Core Processor (Haswell, no TSX)
stepping    : 1
microcode   : 0x1
cpu MHz     : 3092.838
cache size  : 4096 KB
physical id : 0
siblings    : 1
core id     : 0
cpu cores   : 1
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 6185.67
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel Core Processor (Haswell, no TSX)
stepping    : 1
microcode   : 0x1
cpu MHz     : 3092.838
cache size  : 4096 KB
physical id : 1
siblings    : 1
core id     : 0
cpu cores   : 1
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 6185.67
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel Core Processor (Haswell, no TSX)
stepping    : 1
microcode   : 0x1
cpu MHz     : 3092.838
cache size  : 4096 KB
physical id : 2
siblings    : 1
core id     : 0
cpu cores   : 1
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 6185.67
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel Core Processor (Haswell, no TSX)
stepping    : 1
microcode   : 0x1
cpu MHz     : 3092.838
cache size  : 4096 KB
physical id : 3
siblings    : 1
core id     : 0
cpu cores   : 1
apicid      : 3
initial apicid  : 3
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 6185.67
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

have you an idea why it failed ?

Maratyszcza commented 7 years ago

Could you please build https://github.com/Maratyszcza/cpuinfo (confu setup && python configure.py && ninja), runbin/isa-infoandbin/cache-info` and post their output here?

edmBernard commented 7 years ago

isa-info :

Scalar instructions:
    LAHF/SAHF: yes
    LZCNT: yes
    POPCNT: yes
    TBM: no
    BMI: yes
    BMI2: yes
    ADCX/ADOX: no
Memory instructions:
    MOVBE: yes
    PREFETCH: no
    PREFETCHW: no
    PREFETCHWT1: no
    CLZERO: no
SIMD extensions:
    3dnow!: no
    3dnow!+: no
    SSE3: yes
    SSSE3: yes
    SSE4.1: yes
    SSE4.2: yes
    SSE4a: no
    Misaligned SSE: no
    AVX: yes
    FMA3: yes
    FMA4: no
    XOP: no
    F16C: yes
    AVX2: yes
    AVX512F: no
    AVX512PF: no
    AVX512ER: no
    AVX512CD: no
    AVX512DQ: no
    AVX512BW: no
    AVX512VL: no
    AVX512IFMA: no
    AVX512VBMI: no
    AVX512VPOPCNTDQ: no
    AVX512_4VNNIW: no
    AVX512_4FMAPS: no
Multi-threading extensions:
    MONITOR/MWAIT: no
    MONITORX/MWAITX: no
    CMPXCHG16B: yes
    HLE: no
    RTM: no
    XTEST: no
    RDPID: no
Cryptography extensions:
    AES: yes
    PCLMULQDQ: yes
    RDRAND: yes
    RDSEED: no
    SHA: no
    Padlock RNG: no
    Padlock ACE: no
    Padlock ACE 2: no
    Padlock PHE: no
    Padlock PMM: no
Profiling instructions:
    RDTSCP: yes
    LWP: no
    MPX: no
System instructions:
    SYSENTER/SYSEXIT: yes
    RDMSR/WRMSR: yes
    CLFLUSH: yes
    CLFLUSHOPT: no
    CLWB: no
    FXSAVE/FXSTOR: yes
    XSAVE/XSTOR: yes
    FS/GS Base: yes

cache-info

L1 instruction cache: 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 4 processors
L1 data cache: 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 4 processors
L2 unified cache: 4 MB (exclusive), 16-way set associative (4096 sets), 64 byte lines, shared by 4 processors
Maratyszcza commented 7 years ago

@edmBernard Your virtual server is supported, it should work with NNPACK. However, NNPACK code is quite simple in detection of these pre-requisites, and I don't immediately see the problem in it. The relevant code is in https://github.com/Maratyszcza/NNPACK/blob/86bfc32972e02781e7e0393faf556aff4fbc8cc3/src/init.c#L121, and if you figure out where is the flaw in the logic, I will fix it.

edmBernard commented 7 years ago

Thanks for your help I'll test it as soon as possible and report back.

edmBernard commented 7 years ago

I investigate a bit more on my problem. It seem to come from the lack of L3 cache. The device configuration is 4 separates and independentes processor with one core each.

Maratyszcza commented 7 years ago

Right, I didn't notice that at first. Processors without L3 cache are currently not supported in NNPACK.

Maratyszcza commented 7 years ago

You can follow #33 for this feature.