Open Tianyibian opened 11 months ago
Could you please show the line from the hg19 bed file for ZNF713?
chr7 55955294 55955333 ID=ZNF713;MOTIFS=CGG;STRUC=(CGG)n
That looks fine. Are there reads that completely span the locus? Does the locus appear in the output vcf?
Hi, No actually they don't. They typically experience a few bps of deletions at the front end. and for the output vcf, the line exsists but the informative parts containing the counts and the genotypes are missing. here is one output vcf from one of the samples I tested. chr7 55955295 . CGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGT . 0 . TRID=ZNF713;END=55955333;MOTIFS=CGG;STRUC=(CGG)n GT:AL:ALLR:SD:MC:MS:AP:AM .:.:.:.:.:.:.:.
I have also tried to extend a few bp upstream testing with this region chr7 55955240 55955333 ID=ZNF713;MOTIFS=CGG;STRUC=(CGG)n but the problem still persists.
Could you please try running the latest (pre-release) version of TRGT available here with the -vv
command line option to enable a more verbose output? Also, are you working with HiFi whole-genome sequencing data? And, if this is permissible, would you be open to sharing a slice of your BAM file containing this repeat? If yes, here is my email.
Hi, thank you both for your help. I will try to test the pre-released version and post the results later. To answer your question I am using the HiFi WGS data. I will share you with some of the bam file that I tested shortly after as well. Thank you Tianyi
Hi, this problem is solved using the pre-release version of TRGT with the -v option. Yet with the current version of trgt even with the -v option, this strange error of not capturing ZNF713 still exsist. Thank you so much for your help!
Hi, I downloaded the 'pathogenic_repeats.hg38.bed' file from (https://github.com/PacificBiosciences/trgt/blob/main/repeats/pathogenic_repeats.hg38.bed). As all my aligned files were previously aligned to hg19, I used UCSC LiftOver to convert the coordinates from the BED file to hg19 coordinates. I then just followed the example and the program successfully captured almost all the genes, except for ZNF713. For some reason, tandem repeats in this gene are not being genotyped or counted. I confirmed with IGV that my aligned reads do indeed cover the ZNF713 region (which I lifted over from the downloaded BED file), and these reads display a pattern matching CGG(n) repeats. However, the program still fails to detect them. This is confusing since all other genes are processed correctly. I'm unsure about what might be causing this issue.