bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

matam_db_preprocessing.py clustering #107

Open mdsufz opened 2 years ago

mdsufz commented 2 years ago

I want to construct a personalized database. However, from what I understood, I can go to SILVA and download the NR99 (clustered at 99 % identity) or the Ref (not clustered). Usually, I would just download the Ref and then use Vsearch to cluster the sequences at 95 % identity. However, the function matam_db_preprocessing.py also does some clustering to the provided sequence file. So my question is the following: if I run the above mentioned function on the clustered database will it re-cluster these sequences ? If so, can we just provide the unclustered database to MATAM and perform the user-specified identity clustering?