ImagoXV / NanoASV

NanoASV official repo
GNU General Public License v3.0
5 stars 0 forks source link

Problem with phyloseq and Unknown clusters handling #111

Open ImagoXV opened 3 hours ago

ImagoXV commented 3 hours ago

The problem seems to be linked to the phytree object. Same datasets ran with only --notree option as difference handles correctly

But the clusters are found within the tree. I know, this a discrepency problem between names used in the tree and the ASV table

ImagoXV commented 3 hours ago

Yes, within the fasta file, cluster consensus is named like so

>0891922e-1412-42db-b6e7-0df4f11a92eb_Unknown_cluster_122806_clusterid=1227_size=106
AGTTTGATCATGGCTCAGTTCAATACTCTTTAGCAAGAGCACGATCGCTTCTGTGTATTTCCCCTCCAGGATCAACATCC
CACCCTGATCTTTCGTCACATCGGCAATCGCGCTCTTGTCACCAACCTGCTCAAATAGCTCTCTACTTTGAGCATAGTAA
TATCGCATTTGCTCGAAGTTATATGTGAAGCCAGCTGCCGTGCCCAGGTAAAAGAGCAACTGAGCGCGCAGCCATCTATC
GTCTGCGGGAGAGAGCAATTGCAGGCCTTGCTCGTACAAACTTTTGGCCAGTTCGTGATTGCCCTGGGCAAGAGCTGGCC
ATCCACGGTATAATAATGCAGTCGCCAGGCCTCTGGGATCGTCCAATTGCCGCCAGAGCGCGATGCTCGCTTCAGCCAGT
TCTACCGCTTTGCTCTGCTCGTTTTGCAAACATACCAGGCGGGCCACTTCGCCCAACGCTTTTGCCCTTGCTGCCAGTGC
CGTCCTTCCTGCTCCTTCCACAATTGGAACTTCCAGCACTGCATCCAGCCAGCCTCGCCCTTCAACCAGGTGGCCTTGCC
ATTCCCAGTAGGCTCGAAGGGCCGCCGCCAGACGTAGAGACACCTCAACAGCAAGAAGTTCGGCTCCGGGCACTATCTTC
AATGGCACGGCCAGCAATGAGGCGCTATCTCCAACCCGTTTCATTTCTTCCTCATCATTGGTCGACCTGCCCGTGGGGCT

But within the unknown abundance table it is like

0891922e-1412-42db-b6e7-0df4f11a92eb_Unknown_cluster_10076

I realized there are some more complex discrepencies where highly abundant unknown clusters are not reported within the unkown_cluster.tsv file. Which is bad of course

The phytree has the same format as the fasta file

0891922e-1412-42db-b6e7-0df4f11a92eb_Unknown_cluster_1674_clusterid=1673_size=14
ImagoXV commented 3 hours ago

Maybe @frederic-mahe you have an idea about this ?