ShiuLab / PseudogenePipeline

Creative Commons Zero v1.0 Universal
36 stars 18 forks source link

Parameter and need for linking pseudoexons into contigs and repeat masking step #12

Open suba19997 opened 1 year ago

suba19997 commented 1 year ago

Hi, I am working on mining pseudogenes from assembled fish genome. I am grateful for developing this pipeline for finding pseudogenes. I have tblastn results of protein sequence against the genome. On running this pipeline, I have come to know that pseudoexons are linked into contigs based on their intron length at 99th percentile and gap size. I request you to help me understand this step. How to decide these parameters for my fish genome? Please help me with the specific commandline for running this step. Another issue is that Shiu's pipeline is doing repeat masking with built in repeat libraries of certain species among which the fish genome (Clarias magur) I am working with has no repeat libraries in that database. The genome is already repeat masked with soft masking. Is it possible to skip this repeatmasking step in pipeline and what is the importance of this?