CUDA runtime error : device-side assert triggered

The error occurred with seq2squiggle (dev-branch) in prediction mode, triggering a "device-side assert" in CUDA. This caused the process to fail, with an error message indicating that the index was out of bounds. The error stack trace suggests a problem during the model's prediction loop, where a tensor operation failed, leading to the RuntimeError.
This issue began after switching from numpy-based CPU calculations to fully using PyTorch. The error specifically occurs during the export_and_clear_results function, which is triggered at the end of an epoch. It's only happening when running with at leasst -n 20,000.
Full output:
seq2squiggle INFO 14:03:33: seq2squiggle version 0.2.0
seq2squiggle INFO 14:03:33: Arguments:
seq2squiggle INFO 14:03:33:  fasta: example/lamda_genome.fasta
seq2squiggle INFO 14:03:33:  read_input: False
seq2squiggle INFO 14:03:33:  num_reads: 20000
seq2squiggle INFO 14:03:33:  read_length: 10000
seq2squiggle INFO 14:03:33:  coverage: -1
seq2squiggle INFO 14:03:33:  out: example.blow5
seq2squiggle INFO 14:03:33:  profile: prom_r10_dna
seq2squiggle INFO 14:03:33:  noise_sampler: True
seq2squiggle INFO 14:03:33:  duration_sampler: True
seq2squiggle INFO 14:03:33:  ideal_event_length: -1.0
seq2squiggle INFO 14:03:33:  noise_std: 1.0
seq2squiggle INFO 14:03:33:  distr: expon
seq2squiggle INFO 14:03:33:  predict_batch_size: 1024
seq2squiggle INFO 14:03:33:  export_every_n_samples: 250000
seq2squiggle INFO 14:03:33:  seed: 385
seq2squiggle INFO 14:03:33:  model: None
seq2squiggle INFO 14:03:33:  config: None
seq2squiggle INFO 14:03:33:  verbosity: info
seq2squiggle INFO 14:03:33: Config file was not specified. Default config will be used.
seq2squiggle INFO 14:03:33: Setting seeds using random seed 385
seq2squiggle INFO 14:03:33: Weights file path is not provided.
seq2squiggle INFO 14:03:33: Model weights file /path/.cache/seq2squiggle/R1041-human@v0.1.0.ckpt retrieved from local cache
seq2squiggle INFO 14:03:34: Genome mode. Reads will be generated from input fasta: example/lamda_genome.fasta
seq2squiggle INFO 14:03:48: True Prediction dataset size 9580226
Predicting Dat../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [7497,0,0], thread: [32,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [7497,0,0], thread: [33,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
...
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [8065,0,0], thread: [63,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py", line 47, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 897, in _predict_impl
    results = self._run(model, ckpt_path=ckpt_path)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 981, in _run
    results = self._run_stage()
              ^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 1020, in _run_stage
    return self.predict_loop.run()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/loops/utilities.py", line 178, in _decorator
    return loop_run(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/loops/prediction_loop.py", line 130, in run
    return self.on_run_end()
           ^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/loops/prediction_loop.py", line 202, in on_run_end
    results = self._on_predict_epoch_end()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/loops/prediction_loop.py", line 368, in _on_predict_epoch_end
    call._call_lightning_module_hook(trainer, "on_predict_epoch_end")
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py", line 167, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/path/seq2squiggle/src/seq2squiggle/model.py", line 303, in on_predict_epoch_end
    self.export_and_clear_results(keep_last=False)
  File "/path/seq2squiggle/src/seq2squiggle/model.py", line 285, in export_and_clear_results
    res[k] = concatenated_tensor[concatenated_tensor.nonzero()].squeeze()
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/seq2squiggle/src/seq2squiggle/seq2squiggle.py", line 369, in predict
    inference_run(
  File "/path/seq2squiggle/src/seq2squiggle/inference.py", line 329, in inference_run
    trainer.predict(model=load_model, datamodule=fasta_data, return_predictions=False)
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 858, in predict
    return call._call_and_handle_interrupt(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py", line 68, in _call_and_handle_interrupt
    trainer._teardown()
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py", line 1004, in _teardown
    self.strategy.teardown()
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py", line 535, in teardown
    self.lightning_module.cpu()
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 82, in cpu
    return super().cpu()
           ^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/torch/nn/modules/module.py", line 965, in cpu
    return self._apply(lambda t: t.cpu())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/torch/nn/modules/module.py", line 805, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/path/envs/seq2squiggle-dev/lib/python3.12/site-packages/torch/nn/modules/module.py", line 965, in <lambda>
    return self._apply(lambda t: t.cpu())
                                 ^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ZKI-PH-ImageAnalysis / seq2squiggle

CUDA runtime error : device-side assert triggered #4