MNIST example in Google Colab fails with 'LogicError: cuCtxSynchronize failed: an illegal memory access was encountered'

desertnaut commented 4 years ago

I tried to run the MNIST example verbatim in a GPU-enabled Google Colab notebook; it fails on tm.fit with the following output:

Accuracy over 100 epochs:

---------------------------------------------------------------------------

LogicError                                Traceback (most recent call last)

<ipython-input-6-b93488d43d54> in <module>()
      2 for i in range(100):
      3         start_training = time()
----> 4         tm.fit(X_train, Y_train, epochs=1, incremental=True)
      5         stop_training = time()
      6 

/usr/local/lib/python3.6/dist-packages/PyTsetlinMachineCUDA/tm.py in fit(self, X, Y, epochs, incremental, batch_size)
    320                         encoded_Y[:,i] = np.where(Y == i, self.T, -self.T)
    321 
--> 322                 self._fit(X, encoded_Y, epochs = epochs, incremental = incremental, batch_size = batch_size)
    323 
    324                 return

/usr/local/lib/python3.6/dist-packages/PyTsetlinMachineCUDA/tm.py in _fit(self, X, encoded_Y, epochs, incremental, batch_size)
    224                         for e in range(0, number_of_examples, batch_size):
    225                                 self.update.prepared_call(self.grid, self.block, g.state, self.ta_state_gpu, self.clause_weights_gpu, self.class_sum_gpu, self.clause_output_gpu, self.clause_patch_gpu, self.encoded_X_training_gpu, self.Y_gpu, np.int32(e))
--> 226                                 cuda.Context.synchronize()
    227 
    228                 self.ta_state = np.array([])

LogicError: cuCtxSynchronize failed: an illegal memory access was encountered

i.e. similar to this already reported issue.

System info:

Python 3.6.9
pycuda 2019.1.2
PyTsetlinMachineCUDA 0.1.7

with the last two packages installed via pip.

olegranmo commented 4 years ago

Thanks for reporting, @desertnaut! Have not found the error yet. It seems like it occurs when there is not enough memory available on the GPU, but I am not sure. How much memory does your Colab environment provide? Anyway, without enough memory, I guess the memory allocation should fail.

desertnaut commented 4 years ago

Thanks @olegranmo

The issue seems to have been resolved with the new PyTsetlinMachineCUDA version 0.1.8.

As for the memory, using the snippet from this Stack Overflow thread, I get:

Gen RAM Free: 12.1 GB  | Proc size: 1.6 GB
GPU RAM Free: 13768MB | Used: 1311MB | Util   9% | Total 15079MB

cair / PyTsetlinMachineCUDA

MNIST example in Google Colab fails with 'LogicError: cuCtxSynchronize failed: an illegal memory access was encountered' #6