Closed gavinmdouglas closed 2 years ago
Hey there Gavin, that sounds like a fair strategy. If you have highly fragmented assemblies, you might see quite a lot of gene fragments cut off at the edges of contigs, and these may be flagged as putative pseudogenes according to Pseudofinder's algorithm. I seem to remember having a conversation with @mitchso (the other developer on this) about potentially excluding genes at the ends of contigs from pseudogene prediction, but I don't know if this was ever implemented. In any case, if you can do so computationally, I would recommend removing genes that appear truncated due to contig breaks.
I know Prodigal has a flag for this: -c: Closed ends. Do not allow genes to run off edges. But I don't know about Prokka or other gene-calling and annotation pipelines.
Let me know if you have additional questions.
Thanks, Arkadiy
Ok good to know - thanks for the fast reply.
Hi there,
I recently applied pseudofinder to a large set of genomes and I applied a filter to remove certain pseudogenes close to the edges of contigs. I think this makes sense for pseudogenes classified as fragments and for ORFs that are <100% of expected match lengths. Do you have any opinion on this?
Thanks!
Gavin