chklovski / CheckM2

Assessing the quality of metagenome-derived genome bins using machine learning
GNU General Public License v3.0
160 stars 19 forks source link

Error in DIAMOND execution #95

Open KateSakharova opened 5 months ago

KateSakharova commented 5 months ago

Hello, I bumbed into problem with Diamond

INFO: Running CheckM2 version 1.0.1
INFO: Custom database path provided for predict run. Checking database at uniref100.KO.1.dmnd...
INFO: Running quality prediction workflow with 8 threads.
INFO: Calling genes in 1 bins with 8 threads:
    Finished processing 1 of 1 (100.00%) bins.
INFO: Calculating metadata for 1 bins with 8 threads:
    Finished processing 1 of 1 (100.00%) bin metadata.
INFO: Annotating input genomes with DIAMOND using 8 threads
INFO: Processing DIAMOND output
ERROR: No DIAMOND annotation was generated. Exiting

execution command: singularity run quay.io-biocontainers-checkm2-1.0.1--pyh7cba7a3_0.img checkm2 predict --threads 8 --input bins -x fa --output-directory binner13_checkm_output --database_path uniref100.KO.1.dmnd

bins folder contains 1 bin.fa (attached) bins.fa.gz

Should checkm2 generate empty output in that case? Could you explain what is wrong with DIAMOND execution?

Thanks! Best, Kate

chklovski commented 5 months ago

Hi,

The problem here lies with translation when using prodigal - the ~940Kb bin generates only 31 predicted proteins using prodigal, most of which are tiny. As a result, DIAMOND cannot confidently assign any KEGG ID's to any of the protein fragments predicted, generates no output, which leads to the CheckM2 error as that's the only bin in the input. Is it possible the bin contains non-prokaryotic DNA that doesn't play well with prodigal?

KateSakharova commented 5 months ago

Hi @chklovski, Thank you for your answer! I have some bins from the same Run annotated as eukaryotic MAG. You assumption might be correct. I will deeply look into this.

Kate