filip-husnik / pseudofinder

Detection of pseudogene candidates in bacterial and archaeal genomes.
GNU General Public License v3.0
44 stars 16 forks source link

Pseudogenes at contig edges #30

Closed gavinmdouglas closed 2 years ago

gavinmdouglas commented 2 years ago

Hi there,

I recently applied pseudofinder to a large set of genomes and I applied a filter to remove certain pseudogenes close to the edges of contigs. I think this makes sense for pseudogenes classified as fragments and for ORFs that are <100% of expected match lengths. Do you have any opinion on this?

Thanks!

Gavin

Arkadiy-Garber commented 2 years ago

Hey there Gavin, that sounds like a fair strategy. If you have highly fragmented assemblies, you might see quite a lot of gene fragments cut off at the edges of contigs, and these may be flagged as putative pseudogenes according to Pseudofinder's algorithm. I seem to remember having a conversation with @mitchso (the other developer on this) about potentially excluding genes at the ends of contigs from pseudogene prediction, but I don't know if this was ever implemented. In any case, if you can do so computationally, I would recommend removing genes that appear truncated due to contig breaks.

I know Prodigal has a flag for this: -c: Closed ends. Do not allow genes to run off edges. But I don't know about Prokka or other gene-calling and annotation pipelines.

Let me know if you have additional questions.

Thanks, Arkadiy

gavinmdouglas commented 2 years ago

Ok good to know - thanks for the fast reply.