B-UMMI / chewBBACA

BSR-Based Allele Calling Algorithm
GNU General Public License v3.0
129 stars 26 forks source link

Chewbbaca visualization output #181

Open davidmaimoun opened 1 year ago

davidmaimoun commented 1 year ago

Hello! I'm new in the field and I need to use Chewbbaca In the end of the analysis, I get in a visualization folder, a file, cgMLST.tsv. Is the values in this file represent the allele distance of each specie from the schema alleles? When I run it with grapetree, I get branches with values Can you explain to me what are these values?

Thank you

rfm-targa commented 11 months ago

Hello @davidmaimoun,

Sorry for the delay, and thank you for your interest in chewBBACA. Based on the name of the file you've described, cgMLST.tsv, I assume that you performed allele calling with the AlleleCall module and that you determined the core-genome based on the allele calling results with the ExtractCgMLST module. The cgMLST.tsv file contains the allelic profiles of your samples (each row is a strain, and each column is a locus/gene that is present in at least --t strains, where --t is the loci presence value you passed to the ExtractCgMLST module, or the default of [0.95, 0.99, 1] if you did not pass any value). The allelic profiles tell you which alleles were found in your strains. You can find more information about the output files created by the AlleleCall and the ExtractCgMLST modules here and here. The cgMLST.tsv file has the same file structure as the results_alleles.tsv file created by the AlleleCall module, with the difference that it only includes the results for the loci in the core-genome. You can upload the files with the allelic profiles to GrapeTree or to PHYLOViZ to visualise a Minimum Spanning Tree (MST) and perform various dataset operations that allow you to explore and analyse the results (more information about uploading chewBBACA results to PHYLOViZ here). The values displayed in the MST branches correspond to the distance between the strains (the number of allelic differences based on all compared loci). The allelic distances are computed based on the allelic profiles (it computes a distance matrix with the number of allelic differences for each pair of strains). I hope that I could help with my explanation. Feel free to let me know if there is anything else you would like to know.

Kind regards,

Rafael

alexandreflageul commented 1 month ago

Hi @rfm-targa I write here, as the title of this issue can include my question. I would like to include Chewbbaca in my analysis pipeline in complement of another tool that is cgMLSTFinder (from CGE). With cgMLSTFinder, I used to get the complex type of the bacterial strain, and unfortunatelly, I can not find in Chewbbaca doc the way to retrieve the complexe type from chewbbaca analysis. I ran chewBBACA.py PrepExternalSchema to adapt Enterobase scheme, then I ran chewBBACA.py AlleleCall. Output from the last module do not display complexe type.

What am I missing ? Regards, Alexandre

alexandreflageul commented 1 month ago

@jacarrico @aplf

rfm-targa commented 1 month ago

Hello @alexandreflageul,

Thank you for your interest in chewBBACA. chewBBACA does not assign CTs to bacterial strains. The main output of the AlleleCall module is the file containing the allelic profiles, results_alleles.tsv. The allelic profiles contained in this file can serve as the basis for subsequent analysis. You can import that file and sample metadata to PHYLOViZ to visualize an MST and explore the results through several dataset operations. If you want to cluster your samples to identify meaningful clustering levels and define CTs, I'd recommend ReporTree or HierCC. It might also be worth looking up the more recent concept of LIN codes. Let us know if there's anything else.

Kind regards,

Rafael