Open tux-o-matic opened 3 years ago
@tux-o-matic Thank you for reporting this issue. Could you, please, point us to or attach an example you are running? This way, we can reproduce this issue locally and investigate.
Hi @anna-tikhonova , I uses this Python code. Just needs the TF fork and NumPy:
python cifar10_cnn.py
In my case, on a MacBook Air with Intel chips, the backend seems to choose the CPU by default and then throws the error. However, if I specify
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
tensorflow.config.run_functions_eagerly(False)
Then the model gets trained, I can see with the Activity Monitor that Python threads are offloading work to the GPU. But on this integrated Intel GPU the perf is worse than the CPU and even PlaidML as a backend for TF could do better on the GPU.
If you need another example, running this code (from #35 ) also defaults to CPU and Seg Faults.
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
Machine specs: MacOS 11.0.1 on MacBook Pro, 15 inch, 2019. 2.3 GHz 8-Core Intel Core i9 16 GB 2400 MHz DDR4 Radeon Pro 560X 4 GB
@tux-o-matic @hughack I apologize for the late reply. I just tried both of the scripts you provided. I'm not able to reproduce the issue. It's possible that it is resolved in a MacOS update. Could you please try again using an updated MacOS and let me know if you can still reproduce this?
Hi @pooyadavoodi.
On an up to date BigSur, Python 3.8.7
and latest release of this project, still hit the same error.
I managed to reproduce the segfault from @hughack's script using v0.1alpha0, and that issue is resolved in the latest release v0.1alpha2.
@tux-o-matic Could you share the BigSur version you are using? Also are you using the python that comes with the OS, otherwise how did you install it?
I'm testing from BigSur 11.0.1. Python 3.8.7
comes from MacPorts. Earlier tests were on older point release of Python 3.8, still from MacPorts.
this appears to be the same as #127
I posted over there that I've found that this issue seems to be tied to batch size, where the segmentation fault occurs with sufficiently large batches. "Sufficiently large" appears to depend on the Neural network itself. However, all of the neural networks I have tried so far all experience this segfault when the batch size is larger than a certain amount. It's probably possible to solve or replicate this issue by increasing or decreasing your batch size.
I am still experiencing this on the february alpha build and I am using a Conda environment described on this page. (some of the pip commands need to be updated to match the new file names) I hope this helps you replicate the issue.
Also, using @tux-o-matic's workaround I was able to get my network to stop Segfaulting but it caused a memory leak instead (?!?). It appeared to run faster on GPU than it did on CPU (until I run out of memory, that is).
Thanks @atw1020 , indeed reducing the batch size in my benchmark allows epochs to complete on cpu.
It's an interesting behaviour.
I don't expect to be able to use large batch sizes on a laptop with integrated GPU, but when so much is shared.
It's surprising that TF with CoreML is so limited on CPU, yet the GPU with the same memory can handle larger batch sizes.
For reference, the original benchmark used 32
as batch size, that worked only on the GPU. Taking it down to 16
works on the CPU (20
is too high, crashes again).
I'm seeing nonlsegfault issues on 0.1-alpha3 but I'm still getting errors that are solved by using a smaller batch size. Going to keep investigating and hopefully get some new code to reproduce the issue I'm seeing
I've been trying to replicate this issue on 0.1-alpha3 and I haven't been able to so I'm becoming pretty confidant that this issue was fixed in that patch. There seem to be other bugs related to batch size but this one has been addressed. Update if you are still experiencing this issue
First tests using this fork, running model training against Cifar10 dataset for benchmark. But during the first epoch I encounter:
Explicitly setting to run on the GPU however works. But much slower (Intel integrated graphics)
Python 3.8.6 from Mac Ports if it makes any difference.