Open lfearnley opened 5 months ago
The error is caused by the lack of coverage at a given locus. An example case is when there are only 2 support reads for a given locus and each has a different repeat size. And if the min_support
is set at 2, no allele can be formulated with minimum support.
The new version that produces a VCF output tries to associate a FILTER each failed locus. As I wasn't able to anticipate such scenario, I did not generate a failed reason for such scenario and therefor the script crashed.
I have made a fix that would produce a CLUSTERING_FAILED
filter for such scenario and will release it shortly.
In the meantime, if you want to get past this, you could set --min_cluster_size 1
and the program should be able to finish.
Thanks very much for reporting this bug.
Hi, I have been having the same issue. I tried changing --min_cluster_size 1
, but it did not fix the error for me. Do you know another problem that could be the cause. I ran:
straglr.py map-sminimap2-HG002_hg38_chr21.bam .../chr21_test_data/chr21.fa output_straglr --loci HG002_repeats_straglr.bed --min_cluster_size 1
Traceback (most recent call last):
File "/usr/local/bin/straglr.py", line 101, in <module>
main()
File "/usr/local/bin/straglr.py", line 93, in main
variants = tre_finder.genotype(args.loci)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1426, in genotype
return self.collect_alleles(loci)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1402, in collect_alleles
tre_variants = self.get_alleles(loci)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1252, in get_alleles
self.update_refs(variants, genome_fasta)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 1271, in update_refs
refs = self.extract_refs_trf(trf_input)
File "/usr/local/lib/python3.10/site-packages/src/tre.py", line 607, in extract_refs_trf
data_motif = cols[3]
IndexError: list index out of range
This is a different problem. Looks like there is something wrong when the script parsed the results from the TRF run.
Can you try running with --tmpdir <path> --debug
, where <path>
can be set to your output directory. This way the temporary files will be kept. I want to see if there is anything wrong with the latest ***.dat
(TRF output) created.
You can first check the TRF output is there. If you only have a few loci, maybe you can post the content of the .dat file? Or you can attach the file for me to examine.
Best if you can start a new issue for this.
I am running STRaglr 1.5.0 (the current release) and get the following error on multiple CRAMs:
Any suggestions as to what might cause this?