PacificBiosciences / trgt

Tandem repeat genotyping and visualization from PacBio HiFi data
Other
107 stars 9 forks source link

How motifs are defined in the structures of pathogenic STRs ? #45

Open mletexier-cnrgh opened 1 month ago

mletexier-cnrgh commented 1 month ago

Dear team, Is there an error in the motifs included in the STR pathogen bed file? For example, for the DAB1 gene, you have indicated the following motifs: image

However, in several publications, the pathogenic motif is ATTTC. Here, with the structure defined in the bed file, this motif is never looked at, or do I not understand how TRGT works? image

Please enlighten me on this subject, as many other genes do not have the pathogenic motif in their structure.

Thank you very much in advance for your interest in my message. The best Mélanie :)

pbsena commented 1 month ago

Hi Mélanie,

In the BED file the motifs are usually defined by the motif units in the hg38 assembly strand. Is the DAB1 locus by any chance transcribed in the negative strand relative to hg38? Becuase in this case the ATTTC locus would be the GAAAT unit shown above, with AAAAT possibly flanking the expansion.

Best, Guilherme

egor-dolzhenko commented 1 month ago

Hi Mélanie. Just to add to Guilherme's reply, STRchive (from @hdashnow and the team) is a great resource for definitions of known pathogenic repeats. For example, here is the entry for DAB1: https://strchive.org/database/DAB1.html that specifies motifs in both reference and gene orientations.

Best wishes, Egor

mletexier-cnrgh commented 1 month ago

Ah, but of course, that seems obvious now. Thank you both for your reply and for sharing the database, it's actually a lot clearer.

Mélanie

mletexier-cnrgh commented 1 month ago

By chance, do you have a tool that can tell whether a STR is pathogenic or not according to the given thresholds? Mélanie

dnil commented 1 month ago

You might try STRanger https://github.com/Clinical-Genomics/stranger

That is what we and some others do. It does benefit (and add extra info) from its own set of extra fields in those repeat definition files, that you can find over there-ish if you like them.

We love Egor and his tools, but I don’t know if we morally speaking should encourage TRGT, or allow Stranger for in the long run though, as they have this weird partly-closed license excluding use with other chemistries. It adds a bit of complexity to pipelines etc and a bit of a bad taste. ExpansionHunter was cleaner that way. 😔 I hope it changes soon!

mletexier-cnrgh commented 1 month ago

Thank you @dnil, This tool works very well. I'm trying to break down the results to avoid any misunderstandings.

I have the impression that the repeats bed file is the key to obtaining good results. Depending on the version of the pathogenic_repeats.hg38.TRGT.bed file used, there is not the same definition of patterns, and I think that this can lead to false negatives. pathogenic_repeats.hg38.TRGT.bed: chr16 66490398 66490453 ID=BEAN1;REASONS=TAAAA;STRUC=(TAAAA)n chr16 66490398 66490467 ID=BEAN1;PATTERNS=TGGAA,TAAAA;STRUC=

And I was happy to have found an STR expansion in my index case, but the TAAAA pattern is not the one that is pathogenic in the literature, but (TGGAA)*TAAAA.

Mélanie

dnil commented 1 month ago

Thank you for the feedback - I'll move this comment as an issue on the STRanger repo instead! But, quite right, STRanger only deals with the order of the motifs, not their content. This is a particular issue for the non-reference expansions. Compare also in particular RFC1. Most downstream users import the results and images into some graphical environment for evailuation anyway. This is something we have planned adding, any year now! 😄