Unable to run taxometer using `--cuda`

Python: 3.10.14 Vamb: 9810ef047ef41f986b7ddcfdd3dc06947ee0ab6c on GitHub
Looks like the
https://github.com/RasmussenLab/vamb/blob/bdd14d12855081dbe0ab0c42c3cd7d948f997943/vamb/taxvamb_encode.py#L853
needs to be on cuda.
Fixing it fixes the error, but it seems to have low training speed improvements to use cuda (9min/epoch -> 7min/epoch) on a 3090. NVTop showed a quite low GPU utilization (GPU utilization, GPU memory, GPU memory%, CPU utilization, Host memory):
Log:
2024-09-11 09:58:09.508 | INFO    | Starting Vamb version 4.1.4.dev134+g9810ef0
2024-09-11 09:58:09.509 | INFO    | Random seed is 21359552096367181
2024-09-11 09:58:09.509 | INFO    | Invoked with CLI args: 'f/home/-----/miniconda3/envs/meta/bin/vamb bin taxvamb --outdir taxvamb --fasta assembly.filtered.fa --bamdir filtered_bams --taxonomy extracted.taxa.tsv -m 1500 --cuda -p 16'
2024-09-11 09:58:09.509 | INFO    | Loading TNF
2024-09-11 09:58:09.509 | INFO    |     Minimum sequence length: 1500
2024-09-11 09:58:09.509 | INFO    |     Loading data from FASTA file assembly.filtered.fa
2024-09-11 10:03:07.536 | INFO    |     Kept 8111121845 bases in 2916808 sequences
2024-09-11 10:03:07.536 | INFO    |     Processed TNF in 298.03 seconds.

2024-09-11 10:03:07.536 | INFO    | Loading depths
2024-09-11 10:03:07.536 | INFO    |     Reference hash: ab5a09db778cb776e0d97da9ecfdb9ca
2024-09-11 10:03:07.536 | INFO    |     Parsing 7 BAM files with 16 threads
2024-09-11 10:03:07.536 | INFO    |     Min identity: 0.0
2024-09-11 10:30:01.329 | INFO    |     Order of columns is:
2024-09-11 10:30:01.335 | INFO    |          0: filtered_bams/dongjiang_rb.sorted.bam
2024-09-11 10:30:01.335 | INFO    |          1: filtered_bams/xijiang_rb.sorted.bam
2024-09-11 10:30:01.336 | INFO    |          2: filtered_bams/yujiang_rb.sorted.bam
2024-09-11 10:30:01.336 | INFO    |          3: filtered_bams/unclassified.sorted.bam
2024-09-11 10:30:01.336 | INFO    |          4: filtered_bams/nan-bei_pan_rb.sorted.bam
2024-09-11 10:30:01.336 | INFO    |          5: filtered_bams/beijiang_rb.sorted.bam
2024-09-11 10:30:01.336 | INFO    |          6: filtered_bams/hongliu_rb.sorted.bam
2024-09-11 10:30:01.337 | INFO    |     Processed abundance in 1613.8 seconds.

2024-09-11 10:30:01.337 | INFO    | Predicting missing values from taxonomy
2024-09-11 10:30:12.296 | INFO    | 20190 nodes in the graph
2024-09-11 10:30:31.696 | INFO    |     Created dataloader
2024-09-11 10:30:31.697 | INFO    | Starting training the taxonomy predictor
2024-09-11 10:30:31.697 | INFO    | Using threshold 0.5
2024-09-11 10:30:32.216 | INFO    |     Network properties:
2024-09-11 10:30:32.216 | INFO    |     CUDA: True
2024-09-11 10:30:32.216 | INFO    |     Hierarchical loss: flat_softmax
2024-09-11 10:30:32.216 | INFO    |     Alpha: 0.15
2024-09-11 10:30:32.217 | INFO    |     Beta: 200.0
2024-09-11 10:30:32.217 | INFO    |     Dropout: 0.2
2024-09-11 10:30:32.217 | INFO    |     N hidden: 512, 512, 512, 512
2024-09-11 10:30:32.217 | INFO    |     Training properties:
2024-09-11 10:30:32.217 | INFO    |     N epochs: 256
2024-09-11 10:30:32.217 | INFO    |     Starting batch size: 1024
2024-09-11 10:30:32.217 | INFO    |     Batchsteps: 25, 75, 150, 225
2024-09-11 10:30:32.217 | INFO    |     Learning rate: 0.001
2024-09-11 10:30:32.217 | INFO    |     N labels: torch.Size([2916808, 7])
2024-09-11 10:30:33.033 | ERROR   | An error has been caught in function 'main', process 'MainProcess' (3829388), thread 'MainThread' (140155813640000):
Traceback (most recent call last):

  File "/home/-----/miniconda3/envs/meta/bin/vamb", line 8, in <module>
    sys.exit(main())
    │   │    └ <function main at 0x7f773b61e710>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>

> File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/__main__.py", line 2200, in main
    run(runner, opt.common.general)
    │   │       │   │      └ <vamb.__main__.GeneralOptions object at 0x7f7797337e20>
    │   │       │   └ <vamb.__main__.BinnerCommonOptions object at 0x7f773b6397e0>
    │   │       └ <vamb.__main__.BinTaxVambOptions object at 0x7f773b639ba0>
    │   └ functools.partial(<function run_vaevae at 0x7f773b61dea0>, <vamb.__main__.BinTaxVambOptions object at 0x7f773b639ba0>)
    └ <function run at 0x7f773b61cb80>

  File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/__main__.py", line 649, in run
    runner()
    └ functools.partial(<function run_vaevae at 0x7f773b61dea0>, <vamb.__main__.BinTaxVambOptions object at 0x7f773b639ba0>)

  File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/__main__.py", line 1420, in run_vaevae
    predict_taxonomy(
    └ <function predict_taxonomy at 0x7f773b61dd80>

  File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/__main__.py", line 1332, in predict_taxonomy
    model.trainmodel(
    │     └ <function VAMB2Label.trainmodel at 0x7f77fe160d30>
    └ VAMB2Label(
        (encoderlayers): ModuleList(
          (0): Linear(in_features=111, out_features=512, bias=True)
          (1-3): 3 x Linea...

  File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/taxvamb_encode.py", line 1047, in trainmodel
    dataloader = self.trainepoch(
                 │    └ <function VAMB2Label.trainepoch at 0x7f77fe160ca0>
                 └ VAMB2Label(
                     (encoderlayers): ModuleList(
                       (0): Linear(in_features=111, out_features=512, bias=True)
                       (1-3): 3 x Linea...

  File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/taxvamb_encode.py", line 968, in trainepoch
    labels_out = self(depths_in, tnf_in, abundances_in, weights)
                 │    │          │       │              └ tensor([[0.8345],
                 │    │          │       │                        [0.8702],
                 │    │          │       │                        [0.8553],
                 │    │          │       │                        ...,
                 │    │          │       │                        [0.8631],
                 │    │          │       │                        [1.8225],
                 │    │          │       │                        [1.1693]], dev...
                 │    │          │       └ tensor([[ 0.3633],
                 │    │          │                 [ 2.2363],
                 │    │          │                 [ 0.6754],
                 │    │          │                 ...,
                 │    │          │                 [-0.1923],
                 │    │          │                 [ 0.0105],
                 │    │          │                 [-0.2597]...
                 │    │          └ tensor([[-1.1825,  0.0716, -0.6795,  ...,  0.1214, -0.6343,  0.1674],
                 │    │                    [-0.9313, -0.4224,  0.4400,  ...,  0.4502, -0.1...
                 │    └ tensor([[0.0000, 0.4290, 0.1922,  ..., 0.0000, 0.0487, 0.2475],
                 │              [0.3522, 0.1935, 0.0540,  ..., 0.1046, 0.0135, 0.0644...
                 └ VAMB2Label(
                     (encoderlayers): ModuleList(
                       (0): Linear(in_features=111, out_features=512, bias=True)
                       (1-3): 3 x Linea...

  File "/home/-----/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           │    │           │       └ {}
           │    │           └ (tensor([[0.0000, 0.4290, 0.1922,  ..., 0.0000, 0.0487, 0.2475],
           │    │                     [0.3522, 0.1935, 0.0540,  ..., 0.1046, 0.0135, 0.064...
           │    └ <function Module._call_impl at 0x7f784593bbe0>
           └ VAMB2Label(
               (encoderlayers): ModuleList(
                 (0): Linear(in_features=111, out_features=512, bias=True)
                 (1-3): 3 x Linea...
  File "/home/-----/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
           │             └ (tensor([[0.0000, 0.4290, 0.1922,  ..., 0.0000, 0.0487, 0.2475],
           │                       [0.3522, 0.1935, 0.0540,  ..., 0.1046, 0.0135, 0.064...
           └ <bound method VAMB2Label.forward of VAMB2Label(
               (encoderlayers): ModuleList(
                 (0): Linear(in_features=111, out_features=...

  File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/taxvamb_encode.py", line 872, in forward
    labels_out = self._predict(tensor)
                 │    │        └ tensor([[ 0.0000,  0.4290,  0.1922,  ..., -0.6343,  0.1674,  0.3633],
                 │    │                  [ 0.3522,  0.1935,  0.0540,  ..., -0.1319, -0.6...
                 │    └ <function VAMB2Label._predict at 0x7f77fe160940>
                 └ VAMB2Label(
                     (encoderlayers): ModuleList(
                       (0): Linear(in_features=111, out_features=512, bias=True)
                       (1-3): 3 x Linea...

  File "/mnt/nvme1n1/public/-----/projects/meta/vamb/vamb/taxvamb_encode.py", line 866, in _predict
    reconstruction = self.outputlayer(tensor)
                     │                └ tensor([[ 0.5725,  0.0408, -0.4522,  ...,  1.0374, -0.5149,  0.1081],
                     │                          [ 0.2728, -0.5565, -0.4375,  ..., -0.6076, -0.5...
                     └ VAMB2Label(
                         (encoderlayers): ModuleList(
                           (0): Linear(in_features=111, out_features=512, bias=True)
                           (1-3): 3 x Linea...

  File "/home/-----/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           │    │           │       └ {}
           │    │           └ (tensor([[ 0.5725,  0.0408, -0.4522,  ...,  1.0374, -0.5149,  0.1081],
           │    │                     [ 0.2728, -0.5565, -0.4375,  ..., -0.6076, -0....
           │    └ <function Module._call_impl at 0x7f784593bbe0>
           └ Linear(in_features=512, out_features=18375, bias=True)
  File "/home/-----/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           │             │       └ {}
           │             └ (tensor([[ 0.5725,  0.0408, -0.4522,  ...,  1.0374, -0.5149,  0.1081],
           │                       [ 0.2728, -0.5565, -0.4375,  ..., -0.6076, -0....
           └ <bound method Linear.forward of Linear(in_features=512, out_features=18375, bias=True)>
  File "/home/-----/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
           │ │      │      │            └ Linear(in_features=512, out_features=18375, bias=True)
           │ │      │      └ Linear(in_features=512, out_features=18375, bias=True)
           │ │      └ tensor([[ 0.5725,  0.0408, -0.4522,  ...,  1.0374, -0.5149,  0.1081],
           │ │                [ 0.2728, -0.5565, -0.4375,  ..., -0.6076, -0.5...
           │ └ <built-in function linear>
           └ <module 'torch.nn.functional' from '/home/-----/miniconda3/envs/meta/lib/python3.10/site-packages/torch/nn/functional.py'>

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
RasmussenLab / vamb

Unable to run taxometer using `--cuda` #360