SchlossLab / Schloss_rrnAnalysis_mSphere_2021

Code Club project analyzing utility of ASVs
MIT License
8 stars 4 forks source link

Create ASVs #34

Closed pschloss closed 3 years ago

pschloss commented 3 years ago

In practice, ASVs allow for a bit of difference between sequences that are clustered together. Thus far we've really been working with ESVs (see #33). We'll cluster by distance to account for that slop. In practice, we would usually use mothur's pre.cluster command. However, that uses the abundances of the sequences to help in the denoising/clustering. Here the abundances don't mean anything. Would like to be able to use any distance threshold to cluster the sequences.

This will require some refactoring of our old code since we need the *.unique.align and *count_table files to help with the clustering...

  1. Convert count_unique_seqs.sh to get_unique_seqs.sh
    • pull out code/convert_count_table_to_tibble.R - make it code/get_esvs.R
    • update make rules
  2. Calculate distances between sequences up to a threshold of 0.05 (get_dists.sh); Update make rules
  3. Create get_asvs.sh that will take in the distance and count_table files along with a distance threshold and output a tibble like we had for ESVs; Update make rules