MGXlab / CAT_pack

CAT/BAT/RAT: tools for taxonomic classification of contigs and metagenome-assembled genomes (MAGs) and for taxonomic profiling of metagenomes
MIT License
192 stars 30 forks source link

found a protein in the predicted Error: proteins fasta file that can not be traced back to one of the contigs in the contigs fasta file #38

Closed bharat1912 closed 4 years ago

bharat1912 commented 4 years ago

I have been running a set of 31 bins but am having problems after prodigal running BAT_run.concatenated.predicted_proteins.faa. Below is an example but I have had the same situation occurring with other bins and it appears to be random, that is if I rerun the same command, the error could be with the protein from a different bin.

Below is the log file

CAT v5.0.4.

BAT is running. Protein prediction, alignment, and bin classification are carried out. Rarw!

Supplied command: /home/bharat/opt/anaconda3/envs/CAT/bin/CAT bins -r 10 -f 0.1 -b bins_fna/ -d /media/bharat/volume1/db/CAT_prepare_20200304/2020-03-04_CAT_database/ -t /media/bharat/volume1/db/CAT_prepare_20200304/2020-03-04_taxonomy/ -o /home/bharat/Desktop/MAG_Analysis/CS1BS/CS1BS_BIN_REASSEMBLY/reassembled_bins/bins_fna/BAT_run

Bin folder: bins_fna/ Taxonomy folder: /media/bharat/volume1/db/CAT_prepare_20200304/2020-03-04_taxonomy/ Database folder: /media/bharat/volume1/db/CAT_prepare_20200304/2020-03-04_CAT_database/ Parameter r: 10.0 Parameter f: 0.1 Log file: /home/bharat/Desktop/MAG_Analysis/CS1BS/CS1BS_BIN_REASSEMBLY/reassembled_bins/bins_fna/BAT_run.log


Doing some pre-flight checks first. [2020-05-03 21:09:06.080226] Prodigal found: Prodigal V2.6.3: February, 2016. [2020-05-03 21:09:06.087578] DIAMOND found: diamond version 0.9.21. Ready to fly!


[2020-05-03 21:09:06.088300] Importing bins from bins_fna/. [2020-05-03 21:09:06.777792] 31 bin(s) found! [2020-05-03 21:09:06.777914] Writing /home/bharat/Desktop/MAG_Analysis/CS1BS/CS1BS_BIN_REASSEMBLY/reassembled_bins/bins_fna/BAT_run.concatenated.fasta. [2020-05-03 21:09:08.017832] Running Prodigal for ORF prediction. Files /home/bharat/Desktop/MAG_Analysis/CS1BS/CS1BS_BIN_REASSEMBLY/reassembled_bins/bins_fna/BAT_run.concatenated.predicted_proteins.faa and /home/bharat/Desktop/MAG_Analysis/CS1BS/CS1BS_BIN_REASSEMBLY/reassem bled_bins/bins_fna/BAT_run.concatenated.predicted_proteins.gff will be generated. Do not forget to cite Prodigal when using CAT or BAT in your publication! [2020-05-03 21:26:52.857614] ORF prediction done! [2020-05-03 21:26:52.858098] Parsing ORF file /home/bharat/Desktop/MAG_Analysis/CS1BS/CS1BS_BIN_REASSEMBLY/reassembled_bins/bins_fna/BAT_run.concatenated.predicted_proteins.faa [2020-05-03 21:26:53.401296] ERROR: found a protein in the predicted proteins fasta file that can not be traced back to one of the contigs in the contigs fasta file: bin.12.orig_1. Proteins should be named contigname#.

bharat1912 commented 4 years ago

The 31 bins were outputs from metawrap and were named bin.XX.orig.fa. Changed the file extension to .fna as .fa was not recogonised.

I have since submitting the above simplified the file names to XX.fna. This appears to have solved the problem of "found a protein in the predicted proteins fasta file that can not be traced back to one of the contigs in the contigs fasta file: bin.12.orig_1. Proteins should be named contigname#."