Open Djeppschmidt opened 11 months ago
Hi Dietrich,
Thanks for your issue!
I am currently on a business trip this month. I will check it and let you know when I have progress.
Best, Xiaoming
Hi Xiaoming,
I got the same issue as Dietrich as I was hoping the tree to output each cluster as a leave. Instead, most clusters are actually present as named internal nodes.
Did you had some times to look into it ? I believe the reason could come from the presence of 0 in the distance matrix as some sequence could be considered as subsets of the other. Maybe replacing those 0 by a really small distance value could produce what Dietrich and I would expect.
If you could pinpoint in your code where the newick tree is done, I could look more into it.
Best, Arnaud
Apologies for the delayed response.
The Newick Tree in RabbitTClust represents the output format of the Minimum Spanning Tree generated in clust-mst. Unfortunately, it is not possible to designate all genome nodes as leaf nodes, as the connections of the edges in the Minimum Spanning Tree are dependent on internal nodes.
Best, Xiaoming
Hello,
I'm really appreciative of the newick format that you recently introduced!
I think this is a bug in building the tree. As I'm working with the newick file, it appears the newick tree is missing internal nodes; rather about half the nodes are labeled with the names that should actually be tips on the tree. For example, I ran rabbitTclust to cluster all salmonella in the NCBI pathogen database (~500k isolates) using the following code:
clust-mst -d 0.001 -l -i fasta_input.txt --newick-tree -o sal.mst.clust.0001
I generate a tree with ~270k tips, and ~238k nodes (it should have ~500k tips).
I ran a tiny version of this with 8 isolates, which produced 3 tips, and 5 internal nodes:
(((/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863221_contigs_skesa.fasta:0.000794,(/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863395_contigs_skesa.fasta:0.016157)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR900926_contigs_skesa.fasta:0.000969,(/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863393_contigs_skesa.fasta:0.001294)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863392_contigs_skesa.fasta:0.013981)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863223_contigs_skesa.fasta:0.000000)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863224_contigs_skesa.fasta:0.020389)/isilon/NCBI/SRAassemblies/skesa_contigs/SRR863396_contigs_skesa.fasta;
This makes it impossible to filter the tree by tips because half the isolates are actually node labels, when I believe they should be tip labels.
I'm curious if anyone else is experiencing this issue? Or maybe I'm missing something?
Thanks for you help, Dietrich