PacificBiosciences / paraphase

HiFi-based caller for highly similar paralogous genes
BSD 3-Clause Clear License
23 stars 4 forks source link

Future plans for paraphase? #24

Open ehre opened 2 weeks ago

ehre commented 2 weeks ago

Hi!

Thanks for a great tool! As a non-bioinformatician (I am a clinical geneticist), I would like to ask you about your considerations and future plans for paraphase. In your latest release, you include many regions in GRCh38 where several are highly clinically relevant whereas other regions have less evidence. At the same time, many classical pseudogenes are not included in paraphase currently. Two examples are PRSS1 and VWF, but there are several others (see e.g. here: https://www.ncbi.nlm.nih.gov/books/NBK535152/ ).

Is it just a matter of time and priority so other clinically relevant pseudogenes will be included in future updates, or are there other (perhaps technical) reasons why they are not included in paraphase?

Thanks and best wishes, Hans

xiao-chen-xc commented 2 weeks ago

Hi Hans, those genes are not included in Paraphase because the sequence similarity with pseudogenes is not high or long enough to cause a problem for HiFi reads (Previous assessments on those genes were based on short reads). Those genes are already genotyped correctly with standard HiFi workflows. For example, VWF and its pseudogene are at most 96-97% similar and HiFi reads are well above 10kb with an accuracy >99%, so we would expect HiFi reads to align correctly to their genes of origin. PRSS1 is only 90% similar to its pseudogene.

If you do come across any genes that are problematic in HiFi data, feel free to reach out to me and I can add them into Paraphase.

ehre commented 2 weeks ago

Many thanks!