In practice, ASVs allow for a bit of difference between sequences that are clustered together. Thus far we've really been working with ESVs (see #33). We'll cluster by distance to account for that slop. In practice, we would usually use mothur's pre.cluster command. However, that uses the abundances of the sequences to help in the denoising/clustering. Here the abundances don't mean anything. Would like to be able to use any distance threshold to cluster the sequences.
This will require some refactoring of our old code since we need the *.unique.align and *count_table files to help with the clustering...
Convert count_unique_seqs.sh to get_unique_seqs.sh
pull out code/convert_count_table_to_tibble.R - make it code/get_esvs.R
update make rules
Calculate distances between sequences up to a threshold of 0.05 (get_dists.sh); Update make rules
Create get_asvs.sh that will take in the distance and count_table files along with a distance threshold and output a tibble like we had for ESVs; Update make rules
In practice, ASVs allow for a bit of difference between sequences that are clustered together. Thus far we've really been working with ESVs (see #33). We'll cluster by distance to account for that slop. In practice, we would usually use mothur's
pre.cluster
command. However, that uses the abundances of the sequences to help in the denoising/clustering. Here the abundances don't mean anything. Would like to be able to use any distance threshold to cluster the sequences.This will require some refactoring of our old code since we need the
*.unique.align
and*count_table
files to help with the clustering...count_unique_seqs.sh
toget_unique_seqs.sh
get_dists.sh
); Update make rulesget_asvs.sh
that will take in the distance and count_table files along with a distance threshold and output a tibble like we had for ESVs; Update make rules