Floating point exception in fxdiv.h:261

How to reproduce:

Install darknet-nnpack following the instructions in the README file.

> gdb --args ./darknet detector test cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./darknet...done.
(gdb) run
Starting program: /home/zgantner/src/darknet-nnpack/darknet detector test cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffcee01700 (LWP 19486)]
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv    255  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 255  0.044 BFLOPs
   16 yolo
   17 route  13
   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8
   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv    255  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 255  0.088 BFLOPs
   23 yolo
Loading weights from yolov3-tiny.weights...Done!
[New Thread 0x7fffc18ed700 (LWP 19488)]
[New Thread 0x7fffc10ec700 (LWP 19489)]
[New Thread 0x7fffc08eb700 (LWP 19490)]
[New Thread 0x7fffc00ea700 (LWP 19491)]

Thread 1 "darknet" received signal SIGFPE, Arithmetic exception.
fxdiv_init_uint64_t (d=<optimized out>) at /home/zgantner/src/NNPACK-darknet/deps/fxdiv/include/fxdiv.h:261
261             __asm__("DIVQ %[d]"
(gdb) info locals
l_minus_1 = <optimized out>
u_hi = <optimized out>
q = <optimized out>
result = <optimized out>
result = <optimized out>
l_minus_1 = <optimized out>
u_hi = <optimized out>
q = <optimized out>
(gdb)

I do not really understand what the issue is there.

If I compile the library with -O0 instead of -O3, there is no FPE.

I see the same issue with full-size YOLOv3.

> gcc --version
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> cat /proc/cpuinfo  | grep "model name" | head -1
model name  : Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz

Here is the backtrace:

(gdb) bt
#0  0x00005555555e84b6 in fxdiv_init_uint64_t (d=<optimized out>) at /home/zgantner/src/NNPACK-darknet/deps/fxdiv/include/fxdiv.h:261
#1  0x00005555555e84b6 in fxdiv_init_size_t (d=<optimized out>) at /home/zgantner/src/NNPACK-darknet/deps/fxdiv/include/fxdiv.h:313
#2  0x00005555555e84b6 in compute_gemm_convolution_inference (input_channels=input_channels@entry=1024, output_channels=output_channels@entry=256, input_size=..., kernel_size=..., output_size=..., output_subsampling=..., input=0x7fffc5329010, kernel=0x7fffc502a010, bias=0x0, output=0x7fffc4efe010, workspace_buffer=0x0, workspace_size=0x0, activation=nnp_activation_identity, threadpool=0x5555566da040, profile=0x0, input_padding=...) at /home/zgantner/src/NNPACK-darknet/src/convolution-inference.c:798
#3  0x00005555555ea8c5 in nnp_convolution_inference (algorithm=<optimized out>, transform_strategy=nnp_convolution_transform_strategy_compute, input_channels=1024, output_channels=256, input_size=..., input_padding=..., kernel_size=Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x8042: 
#4  0x000055555557d583 in forward_convolutional_layer_nnpack ()
#5  0x00005555555c07cc in forward_network ()
#6  0x00005555555c10db in network_predict ()
#7  0x0000555555573e36 in test_detector ()
#8  0x00005555555746ad in run_detector ()
#9  0x0000555555559b9f in main ()

digitalbrain79 / darknet-nnpack

Floating point exception in fxdiv.h:261 #35