Prunoideae / MitoFlex

A mitogenome toolkit inspired by MitoZ, while being more effective, precise and flexible.
GNU General Public License v3.0
19 stars 5 forks source link

Not extracting CDS regions correctly #4

Open charlesfoster opened 3 years ago

charlesfoster commented 3 years ago

Thanks for the interesting tool. I've been trying it out on some reads I have from insects (cicadas). All 13 protein-coding genes are identified, as one would hope for. However, the resulting *.annotated.cds.fa file does not contain the expected sequences: there is non-coding DNA added to the beginning of the CDS or appended after the CDS.

For example, with COX1 there are an additional 46 nucleotides before the start codon that should not be in the sequence. There are also ~3 missing nucleotides at the end. When I look at the final scaffold and manually locate COX1, there is a full, correct sequence there. So, something goes wrong during the process of extracting the protein-coding genes into their own file, and the genomic coordinates for the genes must also be wrong. The latter problem would likely also affect the Circos visualisation.

Is this something that could be fixed? Thanks!

Edit: I should also say that another of the options with MitoFlex doesn't appear to make sense/work as currently specified:

 --keep-temp           remove temporal files and folders after work done.
                        Default False.

If you are specifying --keep-temp, wouldn't that imply that you are telling the program to "keep temporal files and folders after work done", not "remove temporal files and folders after work done"?

Prunoideae commented 3 years ago

MitoFlex's annotation is completely based on tblastn then genewise2, which works for many of the situations as far as I concerned, but now seems too coarse in determining the precise border of genes, this should be fixed by adding steps to determine the start and the end of the sequence in the future.

As a workaround so far, maybe you can try editing configurations.py to turn this on: https://github.com/Prunoideae/MitoFlex/blob/7182598400b0102df808ccbac4c8f052b7dc74f4/configurations.py#L156-L159

...But in cases this didn't work so fine so it's disabled by default, will start working on improving it once I have time. Annotating tools like MITOS can do a better job too.

Also, it's always better to do some manual check after the automatic process, since these algorithms and softwares can always make mistakes, manual curating is necessary to keep the data valid.

The --keep-temp flag is --clean-temp in before, after a while I find it more common to delete temp files to reduce disk usage, I guess I forgot to change the flag description here.