PacificBiosciences / trgt

Tandem repeat genotyping and visualization from PacBio HiFi data
Other
99 stars 7 forks source link

Not capturing ZNF713 after changing coordinates from hg38 to hg19 #19

Open Tianyibian opened 9 months ago

Tianyibian commented 9 months ago

Hi, I downloaded the 'pathogenic_repeats.hg38.bed' file from (https://github.com/PacificBiosciences/trgt/blob/main/repeats/pathogenic_repeats.hg38.bed). As all my aligned files were previously aligned to hg19, I used UCSC LiftOver to convert the coordinates from the BED file to hg19 coordinates. I then just followed the example and the program successfully captured almost all the genes, except for ZNF713. For some reason, tandem repeats in this gene are not being genotyped or counted. I confirmed with IGV that my aligned reads do indeed cover the ZNF713 region (which I lifted over from the downloaded BED file), and these reads display a pattern matching CGG(n) repeats. However, the program still fails to detect them. This is confusing since all other genes are processed correctly. I'm unsure about what might be causing this issue.

hdashnow commented 9 months ago

Could you please show the line from the hg19 bed file for ZNF713?

Tianyibian commented 9 months ago

chr7 55955294 55955333 ID=ZNF713;MOTIFS=CGG;STRUC=(CGG)n

hdashnow commented 9 months ago

That looks fine. Are there reads that completely span the locus? Does the locus appear in the output vcf?

Tianyibian commented 9 months ago

Hi, No actually they don't. They typically experience a few bps of deletions at the front end. and for the output vcf, the line exsists but the informative parts containing the counts and the genotypes are missing. here is one output vcf from one of the samples I tested. chr7 55955295 . CGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGT . 0 . TRID=ZNF713;END=55955333;MOTIFS=CGG;STRUC=(CGG)n GT:AL:ALLR:SD:MC:MS:AP:AM .:.:.:.:.:.:.:.

image

Tianyibian commented 9 months ago

I have also tried to extend a few bp upstream testing with this region chr7 55955240 55955333 ID=ZNF713;MOTIFS=CGG;STRUC=(CGG)n but the problem still persists.

egor-dolzhenko commented 9 months ago

Could you please try running the latest (pre-release) version of TRGT available here with the -vv command line option to enable a more verbose output? Also, are you working with HiFi whole-genome sequencing data? And, if this is permissible, would you be open to sharing a slice of your BAM file containing this repeat? If yes, here is my email.

Tianyibian commented 9 months ago

Hi, thank you both for your help. I will try to test the pre-released version and post the results later. To answer your question I am using the HiFi WGS data. I will share you with some of the bam file that I tested shortly after as well. Thank you Tianyi

Tianyibian commented 9 months ago

Hi, this problem is solved using the pre-release version of TRGT with the -v option. Yet with the current version of trgt even with the -v option, this strange error of not capturing ZNF713 still exsist. Thank you so much for your help!