Hi,
I am working on mining pseudogenes from assembled fish genome. I am grateful for developing this pipeline for finding pseudogenes. I have tblastn results of protein sequence against the genome. On running this pipeline, I have come to know that pseudoexons are linked into contigs based on their intron length at 99th percentile and gap size. I request you to help me understand this step. How to decide these parameters for my fish genome? Please help me with the specific commandline for running this step.
Another issue is that Shiu's pipeline is doing repeat masking with built in repeat libraries of certain species among which the fish genome (Clarias magur) I am working with has no repeat libraries in that database. The genome is already repeat masked with soft masking. Is it possible to skip this repeatmasking step in pipeline and what is the importance of this?
Hi, I am working on mining pseudogenes from assembled fish genome. I am grateful for developing this pipeline for finding pseudogenes. I have tblastn results of protein sequence against the genome. On running this pipeline, I have come to know that pseudoexons are linked into contigs based on their intron length at 99th percentile and gap size. I request you to help me understand this step. How to decide these parameters for my fish genome? Please help me with the specific commandline for running this step. Another issue is that Shiu's pipeline is doing repeat masking with built in repeat libraries of certain species among which the fish genome (Clarias magur) I am working with has no repeat libraries in that database. The genome is already repeat masked with soft masking. Is it possible to skip this repeatmasking step in pipeline and what is the importance of this?