Help needed for RuntimeError: size of tensor a must match the size of tensor b

MapleHe commented 1 month ago

Thanks for your excellent tool. Previously I used vamb 3.0.2 for data analysls. Currently I'm trying to run the latest version of vamb4, but unfortunately met such Runtime Error which I haven't figured out how to solve. I would be grateful if you can provide any suggestion.

Environment

Python: 3.12.6
Vamb: 4.1.4.dev136+g5090ecc

Commands

conda run -p vamb4 vamb bin default --outdir ${WORKING_DIR1} -m 2000 -p ${PROGRAM_T} --cuda --fasta contigs.fa --bamdir ${BAM_DIR}

## OR ##

conda run -p vamb4 vamb bin default --outdir ${WORKING_DIR1} -m 2000 -p ${PROGRAM_T} --cuda --fasta contigs.fa --bamdir ${BAM_DIR} -o "."

Other notes

The contigs.fa, assembled using metaspades, were manually renaming, filtering and concatenating. I assume the concatenation should be the same as Vamb's concatenate.py script with --keepnames option. The read ID is formatted as:

>Sample1-XX-XX.NODE_X_X.1111
AAAAA
>Sample2-XX-XX.NODE_X_X.1111
AAAAA

The bam files in BAM_DIR were generated using bwa-mem2, with reads mapped to concatenated contig, for each sample separately , and sorted using samtools.

Sample1-XX-XX.bam
Sample2-XX-XX.bam

Logs

The full content of log file can be found here in log.txt

Here are tail contents of vamb log:

2024-10-03 16:21:36.260 | INFO    | Clustering
2024-10-03 16:21:36.260 | INFO    |     Windowsize: 300
2024-10-03 16:21:36.260 | INFO    |     Min successful thresholds detected: 15
2024-10-03 16:21:36.260 | INFO    |     Max clusters: None
2024-10-03 16:21:36.261 | INFO    |     Use CUDA for clustering: True
2024-10-03 16:21:36.261 | INFO    |     Binsplitter: "."
2024-10-03 16:21:55.278 | ERROR   | An error has been caught in function 'main', process 'MainProcess' (2581899), thread 'MainThread' (140562393429824):
Traceback (most recent call last):

  File "/projects/Software/miniforge3/envs/vamb4/bin/vamb", line 8, in <module>
    sys.exit(main())
    │   │    └ <function main at 0x7fd622ce2200>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>

> File "/maps/projects/Software/vamb/vamb/__main__.py", line 2183, in main
    run(runner, opt.common.general)
    │   │       │   │      └ <vamb.__main__.GeneralOptions object at 0x7fd622ecf7e0>
    │   │       │   └ <vamb.__main__.BinnerCommonOptions object at 0x7fd622cf65d0>
    │   │       └ <vamb.__main__.BinDefaultOptions object at 0x7fd622fe6600>
    │   └ functools.partial(<function run_bin_default at 0x7fd622ce1620>, <vamb.__main__.BinDefaultOptions object at 0x7fd622fe6600>)
    └ <function run at 0x7fd622ce0680>

  File "/maps/projects/Software/vamb/vamb/__main__.py", line 647, in run
    runner()
    └ functools.partial(<function run_bin_default at 0x7fd622ce1620>, <vamb.__main__.BinDefaultOptions object at 0x7fd622fe6600>)

  File "/maps/projects/Software/vamb/vamb/__main__.py", line 1204, in run_bin_default
    cluster_and_write_files(
    └ <function cluster_and_write_files at 0x7fd622ce1080>

  File "/maps/projects/Software/vamb/vamb/__main__.py", line 1090, in cluster_and_write_files
    for i, cluster in enumerate(clusters):
        │  │                    └ <itertools.islice object at 0x7fd609a5ed90>
        │  └ <vamb.cluster.Cluster object at 0x7fd608fcfd80>
        └ 14959

  File "/maps/projects/Software/vamb/vamb/cluster.py", line 297, in __next__
    cluster, _, points = self.find_cluster()
                         │    └ <function ClusterGenerator.find_cluster at 0x7fd6364640e0>
                         └ ClusterGenerator(85 points, 14960 clusters)

  File "/maps/projects/Software/vamb/vamb/cluster.py", line 541, in find_cluster
    threshold = self.find_threshold(distances)
                │    │              └ tensor([0.5873, 0.2544, 0.5492, 0.2756, 0.7862, 0.4555, 0.5639, 0.3698, 0.3409,
                │    │                        0.4678, 0.7385, 0.4397, 0.2854, 0.467...
                │    └ <function ClusterGenerator.find_threshold at 0x7fd636464040>
                └ ClusterGenerator(85 points, 14960 clusters)

  File "/maps/projects/Software/vamb/vamb/cluster.py", line 455, in find_threshold
    below_xmax = (distances <= _XMAX) & self.kept_mask
                  │            │        │    └ <member 'kept_mask' of 'ClusterGenerator' objects>
                  │            │        └ ClusterGenerator(85 points, 14960 clusters)
                  │            └ 0.3
                  └ tensor([0.5873, 0.2544, 0.5492, 0.2756, 0.7862, 0.4555, 0.5639, 0.3698, 0.3409,
                            0.4678, 0.7385, 0.4397, 0.2854, 0.467...

RuntimeError: The size of tensor a (87) must match the size of tensor b (85) at non-singleton dimension 0

MapleHe commented 1 month ago

I rerun the pipeline from mapping step, got the same error.

concatenate contigs

concatenate.py -m 2000 contigs.2k.fa sample1.contigs.fasta sample2.contigs.fasta
bwa-mem2 index -p contigs.2k contigs.2k.fa

mapping to contigs


bwa-mem2 mem contigs.2k sample1_1.fq sample1_2.fq | \
samtools view -bS -F 3584 - | \
samtools sort -O bam -o bams/sample1.contigs.bam

same for sample 2


3. run vamb
```bash
conda run -p vamb4 vamb bin default --outdir vamb_output -m 2000 -p 32 --cuda --fasta contigs.2k.fa.gz --bamdir bams/

MapleHe commented 1 month ago

Update:

These two version of VAMB works fine using the same data, the separator can be either default "C" or customized "." .

Buid from source branch v4.1.3
pip installed version v3.0.9

RasmussenLab / vamb