Closed zaka-edd closed 4 months ago
Thanks for trying Straglr and for providing the files, it really helps debugging.
I think the culprit is a few of the loci have very long motif, which cause problem for TRF as Straglr incorporates the motif sequence in the sequence header and TRF has an upper length limit for sequence headers.
You can try removing the loci with over 100bp motif and that should probably solve the problem.
Also, Staglr is not meant for genotyping homopolymer, so you should also get rid of the homompolyer loci (including the homopolymers will cause problems)
You can use awk
to do this:
awk 'length($4)<=100 && length($4)>=2' chr21_regions.bed > new_chr21_regions.bed
Also, you should use --min_cluster_size 2
and --min_support 2
, having at least 2 reads to support a genotype call makes more sense
--min_support 2 --min_cluster_size 2 --max_str_len 100 --min_str_len 2
Note that setting --max_str_len
won't solve the TRF problem, you have to manually eliminate the long-motif loci first, like using the awk
command as suggested
Thank you for your help! I will make sure to exclude these regions.
I am trying to run STRaglr 1.5.0 (current release) from a docker container, on ONT data with a catalog of TR regions for chr21 as the input bed file. I kept getting an error so I tried playing around with some parameters. However I kept getting the same error. The command I ran looked like this:
However I get this error:
I tried running a debug like
Any suggestions on what could cause the error? I included all the debug files and the input
region bed file
. chr21_regions.bed.gz debug.log.gz tmp.tar.gz