benoitsteiner / tensorflow-opencl

OpenCL support for TensorFlow
Apache License 2.0
472 stars 86 forks source link

Memory manegement fault when running the mnist tutorial in Python3 #39

Closed krikru closed 7 years ago

krikru commented 7 years ago

When running convolutional.py in the mnist folder with Python3, I get the output

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-29 03:14:01: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-29 03:14:01: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
*** Error in `python3': double free or corruption (fasttop): 0x00007f97ec005670 ***
*** Error in `python3': malloc(): memory corruption (fast): 0x00000000043b5ce0 ***
Aborted

When immediately running the script a second time I get another error:

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
2017-01-29 03:23:13: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-01-29 03:23:13: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
*** Error in `python3': malloc(): memory corruption (fast): 0x00007fac377fd0d0 ***
*** Error in `python3': Segmentation fault

Seems like there is some problem with the memory management which manifests nondeterministically. Uninitialized pointer perhaps?

Anyone knows what causes this or how I can find that out?

lukeiwanski commented 7 years ago

That should be fixed in https://github.com/benoitsteiner/tensorflow-opencl/commit/26137d6bf516ec28a708995f8a0d516905443bcb