Open crcardenas opened 3 weeks ago
Hi!
Thank you for your feedback. I am a little surprised by what you found.
Is it possible for you to share the listed contigs, your command line, and your reference so I can check all of this?
Best, Rémi
Hello Rémi,
Interestingly, when I subset it like this, I end up with COX1. However, I still see disagreement between my blastn output and mitofinder: I see the following genes in my blast: COX1, COX2, COX3, ND2, ND3, ND4, ND5, ND6, and rrnS. However, mitofinder recovers these genes in addition to ND1,ND4L, and CYTB; but not rrnS.
It looks like blastx is used at some point, so that might explain part of this but; so I think my initial prediction about the 0.0 value was inaccurate.
However, I cannot be certain given that the mitofinder blast files are inconsistently empty ( those found in CBX1139_subset/CBX1139_subset_tmp/*_blast_out.txt
) and there is inconsistencies between some of those missing genes.
Please see the attached files: CBX1139_subset_files.zip
reference.fasta
-this is the file found in the CBX1139_subset/CBX1139_subset_tmp/contig_id_database.fasta
its what I used to create the blast database.
CBX1139_subset.fasta
contans any contigs below the default 25k threshold.
Hopefully everything I have provided is clear and I didn't miss anything.
Best, Cody
#!/bin/bash
source /local/anaconda3/bin/activate
conda activate singularity
cd /home/cody/CALOSOMA_Genomes/NANOPORE/blobtools/CBX1139/mitofinder
for i in 1; do
# can be setup to iterate over a directory of files; cp assemblies over (singularity does not like symlinks) and adjust -B,-j,-a, etc.
/Calosoma_WORKING/MitoFinder/ mitofinder_v1.4.1.sif \
singularity run \
-B /home/cody/mito/:/home/cody/mito/,/data/raw/Calosoma/NANOPORE/blobtools/CBX1139/mitofinder/:/data/raw/Calosoma/NANOPORE/blobtools/CBX1139/mitofinder/ mitofinder_v1.4.2.sif \
-j CBX1139_subset \
-a /data/raw/Calosoma/NANOPORE/blobtools/CBX1139/mitofinder/CBX1139_subset.fasta \
-r /data/raw/Calosoma/NANOPORE/blobtools/CBX1139/mitofinder/reference.gb \
-m 64 -p 8 -o 5 --rename-contig no \
--min-contig-size 400 --max-contig-size 30000
done |& tee -a ./mitofinder_`date +%Y%m%d-%H%M%S`.log
for i in 1; do
#see previous comment about iteration, less necessary here.
DB="../CBX1139/mitofinder/find_COI";
blastn -db ${DB}/test.fasta -query CBX1139_subset.fasta \
-outfmt "6" -max_target_seqs 10 -evalue 1e-45 -num_threads 3 \
-out blastn_mito_CBX1139_blast.out;
done
I am running mitofinder on assembled contigs to search for mtDNA/Genomes. However, I am missing CO1 from the output. I thought maybe I just didn't capture that mtGene. However, to confirm my suspicions, I ran blastn using the same 5 reference genes and found a contig or two that should contain that gene (three hits actually, but the longest blast alignment fits the expected length of 1.5k bp). For some reason the tmp files directory generated by mitofinder has the CO1 blast output empty (actually all of them are) so I cannot check the blast scores used by mitofinder.
Here is an example of my sanity check where all CO1 evalues are 0.0
My understanding is that blast will round a very small e-value to 0 if its a good match.
However, I noticed that this could be an issue with the way the max value is added in the python script: line 84 mitofinder
I was also missing some other genes before adding mtDNA reference from a sister genus, which I think might explain why some of the genes were missing. Here is a good example, ND4: ~ 1.3k bp reference genes. Which now appear with near zero evalues for the longest alignment length contig.
I would normally try tweaking the code myself, but I am using the singularity container and don't know how to edit anything there.
Any advice would be greatly appreciated!
To clarify, I am using oxford nanopore output, so this might be part of the issue. However, I have kept the max length for contigs as default (25k bp); so I do not think it would be in conflict with the framework of using mitofinder for UCE assemblies. It may still not be appropriate, but I thought you would want to be made aware of this issue.