Closed quliping closed 5 years ago
Hello. I do not recommend using CheckM for taxonomic classification. This is not the intended use of the software. You can determine the full taxonomic assignment of CheckM using the tree_qa
command, e.g.:
checkm tree_qa -o2
https://github.com/Ecogenomics/CheckM/wiki/Genome-Quality-Commands#tree_qa
Cheers, Donovan
Thank you a lot! I konw what you mean and the checkM taxonomic result is just one of my reference. But there is still a problem that the tree_qa results are different from the Marker lineage column. Just like bin108, the tree_qa result is 'kBacteria;pActinobacteria;cActinobacteria', but the marker lineage result is 'salgicola'...The other one example, bin109, the results are 'kArchaea (root)' and 'kBacteria'... Also bin 113, they are 'kBacteria;pProteobacteria;cAlphaproteobacteria;oRhodospirillales;fRhodospirillaceae' and 'cBetaproteobacteria'... Of course I know they are different principles, one is base on genome tree the other is marker set. So the question is which kind of result should I trust? Or I should use another software just like PhyloPhlAn? I'm confused right now.
Thank you a lot!
Qu Liping
Hello Qu. These results don't make sense to me. Both the phylogenetic placement and marker set used for completeness/contamination estimates should be in line with each other. Can you send me the relevant CheckM output files?
Of course I can. The file 'checkM_result_7_8.txt' is the result of 'lineage_wf' command and the 'tree_qa_result_7_8.txt' is the result of 'tree_qa' command. I also upload the genome tree from tree_qa which named 'checkm7_8.zip'. checkM_result_7_8.txt tree_qa_result_7_8.txt checkm7_8.zip
Thank you again.
Qu Liping
Hello. Those results do not make sense to me. Both files use the same tree placement so it shouldn't be possible to get incongruent results. Can you send me the 3 bins you have flagged (108, 109, 113)?
OK. I will send the three bins to you. And I also put the other bins that have the same situation into the zip file. different_bins.zip And here are my commands: checkm lineage_wf -x fa -t 8 --pplacer_threads 8 bin_folder output_folder checkm tree_qa -o 2 -f checkm7_8.txt checkM7_8
I'm sorry to bother you so much. Thank you again.
Hello. I'm not able to reproduce your results. The 108 and 113 bins use the kBacteria (UID203) marker set, while 109 uses the kArchaea (UID2) marker set. The tree_qa -o2
table indicates that:
bin 108 = kBacteria;pActinobacteria;cActinobacteria
bin 113 = kBacteria;pProteobacteria;cAlphaproteobacteria;oRhodospirillales;fRhodospirillaceae bin 109 = k__Archaea
This is all consistent and as I would expect. A marker set higher in the tree to where a bin is placed is often used as it was determined that this marker set likely gives a better estimate. For example, there many not be enough f__Rhodospirillaceae reference genomes to establish a reliable set of markers for estimating genome completeness and contamination.
I'm so sory... I went back to confirm the results. And I find something wrong with me. Actually there are two samples and both of them have 118 bins, so I think I I confused their results... I find the really results from the two commands that belong to sample7_8 are the same... So I made a mistake, not the software. I'm so sory, maybe you should close this question...
Glad to hear the issue is resolved. Best of luck with your research.
Hellow, I used the "checkm lineage_wf" command and I'm confused about the "Marker lineage" column in the result. For example, the result of one of my bin is "s__algicola (UID2846)", but there are many species name that we call them "algicola" just like 'Bacillus algicola' and 'Crocinitomix algicola', so how should I know which kind of algicola my bin is?