RuntimeError: CUDA error: an illegal memory access was encountered

I was following the installation tutorial, selected pygmalion and then MAS opened, however as soon as I try to speak with Monika the game freezes and the following below appears on the command prompt. I'm not sure if it has anything to do with my GPU, it's a geforce RTX 2060 super which should have enough VRAM to do this kind of stuff since I use it for stable diffusion with zero issues.

Exception in thread Thread-7:
Traceback (most recent call last):
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "main.py", line 458, in listenToClient
    bot_message = inference_fn(pyg_model,tokenizer,history, received_msg,generation_settings,char_settings,history_length=context_size,count=pyg_count)
  File "A:\game storage\MonikA.I\run_pygmalion.py", line 36, in inference_fn
    model_output = run_raw_inference(model, tokenizer, prompt,
  File "A:\game storage\MonikA.I\pygmalion\model.py", line 70, in run_raw_inference
    logits = model.generate(stopping_criteria=stopping_criteria_list,
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\transformers\generation\utils.py", line 1437, in generate
    return self.sample(
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\transformers\generation\utils.py", line 2443, in sample
    outputs = self(
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\accelerate\hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py", line 741, in forward
    transformer_outputs = self.transformer(
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\accelerate\hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py", line 621, in forward
    outputs = block(
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\accelerate\hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py", line 327, in forward
    attn_outputs = self.attn(
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\accelerate\hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py", line 279, in forward
    return self.attention(
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\accelerate\hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\transformers\models\gpt_neo\modeling_gpt_neo.py", line 223, in forward
    query = self.q_proj(hidden_states)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\accelerate\hooks.py", line 158, in new_forward
    output = old_forward(*args, **kwargs)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\bitsandbytes\nn\modules.py", line 242, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\bitsandbytes\autograd\_functions.py", line 488, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\torch\autograd\function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\bitsandbytes\autograd\_functions.py", line 303, in forward
    CA, CAt, SCA, SCAt, coo_tensorA = F.double_quant(A.to(torch.float16), threshold=state.threshold)
  File "A:\game storage\MonikA.I\libs\pythonlib\lib\site-packages\bitsandbytes\functional.py", line 1634, in double_quant
    nnz = nnz_row_ptr[-1].item()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

i'm not really sure where to go from here, maybe i need to reinstall something or if i just need to start over. thanks!

Rubiksman78 / MonikA.I

RuntimeError: CUDA error: an illegal memory access was encountered #50