lplough commented 4 years ago

I am interesting in post-DADA2 clustering of ASVs for more general animal biodiversity questions at the species or genus-level using, in this case, the CO1 marker. While the ASVs are very interesting for looking at sub-species (or cryptic species) questions, for comparisons of metabarcoding-based biodiversity to morphological-based inferences of biodversity (microscope based ID of species), it is preferable to work with clustered OTUs (e.g. 97% OTUs) from my pipeline. I have seen this being done recently and have even seen comments in other DADA2 threads about post-clustering with e.g. vsearch. My question is, how would one approach updating (generating) the resultant, more clustered OTU count table (presumably with fewer ASVs or OTUs) based on the new set of OTUs?

Would I use the OTU table?: e.g.

OTUid	Sample1	Sample2	Sample3
Seq1	100	237	1
Seq2	0	32	399
Seq3	90	1	0

Or just re map the filtered merged sequences (from previous DADA2 step) back to these new ASVs (e.g. the centroids fasta file produced from vsearch clustering) and rebuild the OTU table? How to keep track of sample names and abundances of dereplicated seqs from all the samples/groups?

Anyway, I wonder if someone has done this within DADA2 or if this is something vsearch has capabilities for? Any suggestions on the approach would be much appreciated!

LP

benjjneb commented 4 years ago

@mikemc @digitalwright Any thoughts on useful OTU clustering methods in R?

spholmes commented 4 years ago

If there is tree already available, we use glomming in phyloseq.

On Fri, Feb 14, 2020, 18:27 Benjamin Callahan notifications@github.com wrote:

@mikemc https://github.com/mikemc @digitalwright https://github.com/digitalwright Any thoughts on useful OTU clustering methods in R?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/947?email_source=notifications&email_token=AAJFZPNFRV5CLIGFNVWUG4DRC5HHTA5CNFSM4KVP25U2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL27LOI#issuecomment-586544569, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFZPL4IFDW6TI2O4VHYWTRC5HHTANCNFSM4KVP25UQ .

digitalwright commented 4 years ago

You can align your sequences with DECIPHER using AlignSeqs or AlignTranslation and then cluster them into OTUs at a specific similarity level with DistanceMatrix and IdClusters. Once you have the cluster numbers it is relatively simple to collapse an ASV table into an OTU table using tapply or the like.

mikemc commented 4 years ago

So far I have done it manually, similar to @digitalwright 's suggestion, but with Biostrings::stringDist() to compute the distance matrix, hclust(d, method="complete") to perform the clustering, and cuttree() to get the groups, and then collapsing the abundance table manually with dplyr functions. I have considered implementing a new glom function in speedyseq to do this operation on phyloseq objects but which precise distance and clustering method matches the agreed upon definition of "a 97% OTU" (if such a definition exists). In particular, how should the distance matrix account for variation in ASV length and which clustering method (average, complete, etc).

Phyloseq's tip_glom() function is doing something similar except it uses ape::cophenetic.phylo() to compute a distance matrix from the tree and uses cluster::agnes() for the clustering by default (which uses method = "average" by default).

digitalwright commented 4 years ago

Often OTUs are defined as all sequences being within a percent similarity. This corresponds to complete-linkage. The ends of sequences are typically ignored because the different lengths are due to missing data, as opposed to indels. The approach I described previously is what I think you are trying to do, and it is designed to assign cluster numbers very quickly.

mikemc commented 4 years ago

@lplough Some code to do what @digitalwright and I discussed without using phyloseq is as follows. This code assumes that seqtab is the final sequence table produced by DADA2.

library(tibble)
library(dplyr)
# Packages that are required but not loaded:
# library(DECIPHER)
# library(Biostrings)

nproc <- 4 # set to number of cpus/processors to use for the clustering

asv_sequences <- colnames(seqtab)
sample_names <- rownames(seqtab)
dna <- Biostrings::DNAStringSet(asv_sequences)

## Find clusters of ASVs to form the new OTUs
aln <- DECIPHER::AlignSeqs(dna, processors = nproc)
d <- DECIPHER::DistanceMatrix(aln, processors = nproc)
clusters <- DECIPHER::IdClusters(
  d, 
  method = "complete",
  cutoff = 0.03, # use `cutoff = 0.03` for a 97% OTU 
  processors = nproc)

## Use dplyr to merge the columns of the seqtab matrix for ASVs in the same OTU
# prep by adding sequences to the `clusters` data frame
clusters <- clusters %>%
  add_column(sequence = asv_sequences)
merged_seqtab <- seqtab %>%
  # setup: turn seqtab into a tibble with rows = ASVs and columns = samples
  t %>%
  as_tibble(rownames = "sequence") %>%
  # add the cluster information
  left_join(clusters, by = "sequence") %>%
  # merge ASVs in the same cluster, summing abundances within samples
  group_by(cluster) %>%
  summarize_at(vars(-sequence), sum) %>%
  # Set new taxa names to OTU<cluster #> 
  mutate(cluster = paste0("OTU", cluster)) %>%
  # convert back to a matrix in the original orientation
  column_to_rownames("cluster") %>%
  as("matrix") %>%
  t

Now merged_seqtab is the new abundance table with ASVs combined into OTUs, and cluster can be saved to keep the map from the original ASV sequences to OTUs. Here I'm keeping things simple by making OTU names from the cluster numbers, so the OTUs are no longer going to be ordered from most to least abundant like in the original sequence table.

Edit: Fixed cluster_tbl -> clusters

digitalwright commented 4 years ago

This looks correct. I'll just point out that you could do the last step without dplyr using base::rowsum.

mikemc commented 4 years ago

Oh, yes that is much simpler. You can replace the last part (the long pipe chain) with e.g.

merged_seqtab <- seqtab %>% 
  t %>%
  rowsum(clusters$cluster) %>%
  t
# Optional renaming of clusters to OTU<cluster #>
colnames(merged_seqtab) <- paste0("OTU", colnames(merged_seqtab))

gomgom83 commented 4 years ago

Yes, ok and great thanks for this post! But now what about the sequence for taxonomic assignment? Where are there? How to get them as ASVs have been merged according clustering results??

mikemc commented 4 years ago

@gomgom83 If you use phyloseq, then you might want to use the merge_taxa_vec() function I made for this purpose, now included in the speedyseq package. See the example here.

gomgom83 commented 4 years ago

Yes of course I use Phyloseq after preprocessing with dada2. Ok I just see your page. I will try this and give you feedback! It a litle bit strange to make OTU clustering after denoising with dada2. But finaly, dada2 seems to give a lot of more ASVs vs. OTUs (with qiime) an probably because of it sensitivity for identifying 16S multicopies & strain level... that is a little pb when you want to be at a species level (gather strain & 16S copy togheter) even if 97/98/99% ID for clutering is really arbitrary!, doesn't make sense (biologically)!! really difficult to find the good way for analysis. thanks michael Fa

gomgom83 commented 4 years ago

Thanks for all, it works!

braddmg commented 4 years ago

Hi everyone, I need to cluster my ASVs sequences in OTUs (99%), so I followed these indications and everything was great. Once I got the new phyloseq object, I made son filtering and then, export the OTU table in a csv file with the package "microbiome". I opened the file with excel and to confirm the clustering I took the sequences of two different OTU (what I know are very similar) and made a BLAST with NCBI tool. The BLAST result indicates that these sequences have a 99.6% percent identity. So, are those two OTUs supposed to be grouped into a single OTU because their similarity is greater than 99%? I attach my script and a BLAST screenshot.

Script

Agrupar ASVs en OTU

library(tibble) library(dplyr) install.packages("DECIPHER") BiocManager::install("DECIPHER") BiocManager::install("Biostrings") install.packages("remotes") remotes::install_github("mikemc/speedyseq")

asv_sequences <- colnames(seqtab.nochim) sample_names <- rownames(seqtab.nochim) dna <- Biostrings::DNAStringSet(asv_sequences) nproc <- 8 # Increase to use multiple processors aln <- DECIPHER::AlignSeqs(dna, processors = nproc) d <- DECIPHER::DistanceMatrix(aln, processors = nproc) clusters <- DECIPHER::IdClusters( d, method = "complete", cutoff = 0.01, # corresponds to 99% OTUs processors = nproc ) merged_seqtab <- seqtab %>% t %>% rowsum(clusters$cluster) %>% t

Next we merge_taxa_vec() to get the merged phyloseq object,

ps0 <- merge_taxa_vec( ps, group = clusters$cluster, tax_adjust = 2 ) especies<-subset_samples(ps0, Specie!="Neivamyrmex pilosus") especiesNC <- subset_taxa(especies, !is.na(Kingdom) & !Kingdom %in% c("", "Eukaryota")) especiesNC <- subset_taxa(especiesNC, !is.na(Kingdom) & !Kingdom %in% c("", "NA")) especiesNC <- subset_taxa(especiesNC, !is.na(Order) & !Order %in% c("", "Chloroplast")) write_phyloseq(especiesNC, p = "E:/Documentos/ArmyAnts/Especies/OTU/") write_phyloseq(especiesNC, type = "TAXONOMY", p = "E:/Documentos/ArmyAnts/Especies/OTU") BLAST

digitalwright commented 4 years ago

You are using method="complete" in IdClusters(). This is complete-linkage clustering, so all sequences assigned to a cluster will be within X% cutoff. However, sequences can be split across different clusters that still meet the cutoff. This is because distance values are non-transitive. If you want to guarantee that sequences within X% are clustered together then use single-linkage clustering (method="single").

Also, note that distances are based off of your multiple sequence alignment. BLAST reports local percent identity, although in this particular case the query coverage was 100%. In general, the distance matrix is not expected to exactly equal the BLAST percent identity.

braddmg commented 4 years ago

Thank you very much, I think I got it!

lplough commented 4 years ago

This is exactly what I was hoping for! Thanks for sharing your code and advice @mikemc and @digitalwright. However, when I run this on my DADA2 seqtab, I get this error: Error in tbl_vars_dispatch(x) : object 'cluster_tbl' not found

Was cluster_tbl supposed to be created before running the last set of piped scripts (to create the merged_seqtab clustered table)? It doesnt exist as an object yet. What am I missing?

mikemc commented 4 years ago

@lplough thanks for catching this; cluster_tbl should have been named clusters in https://github.com/benjjneb/dada2/issues/947#issuecomment-589178796 - I fixed that now. However, you are better off replacing the computation of merged_seqtab (everything below ## Use dplyr ...) with the simpler code in https://github.com/benjjneb/dada2/issues/947#issuecomment-589237919

lplough commented 4 years ago

Thanks @mikemc ! And just to be clear, the renaming of col names by OTU colnames(merged_seqtab) <- paste0("OTU", colnames(merged_seqtab)) refers to the original ASV seqs before 'clustering', correct? i.e. the sequences would map across the column names, the OTUs (if I wanted to, but obviously would make an ugly table), like this: colnames(merged_seqtab)<-clusters$sequence[1:length(colnames(merged_seqtab))]

mikemc commented 4 years ago

I'm answering this away from an R terminal, so you will want to check what I say here for yourself (e.g. check the docs and output for IdClusters() and the rowsum() and the column names before and after renaming).

And just to be clear, the renaming of col names by OTU colnames(merged_seqtab) <- paste0("OTU", colnames(merged_seqtab)) refers to the original ASV seqs before 'clustering', correct? i.e. the sequences would map across the column names, the OTUs (if I wanted to, but obviously would make an ugly table), like this: colnames(merged_seqtab)<-clusters$sequence[1:length(colnames(merged_seqtab))]

I don't really understand the question - there will be fewer OTUs than ASVs, so it is not possible for the ASV sequences to map across OTUs. Here's some more info that hopefully helps some.

After merging, each column represents an OTU. What you call these OTUs is arbitrary and up to you. By default, the rowsum() function set the names based on the group argument, which was clusters$cluster - this is just an integer indexing of the clusters and has nothing to do with the original ASVs. I don't think it's a good idea to leave the taxa names as an integer so I suggested changing e.g. 26 -> OTU26. The link between cluster id number and ASVs is in the clusters table.

How these new OTUs are ordered depends on the reorder argument to rowsum(). By default, rowsum() reorders the rows (which are the columns, before and after transposing with t()), based on clusters$cluster. So as shown above you cannot assume any relationship between the new column order and the original ASVs. Setting reorder = FALSE keeps the original order but there is not a 1-1 correspondence between ASVs and OTUs so you'll still have to use the clusters to link ASVs with OTUs.

If you would prefer that the OTU names be set to the most abundant ASV in each cluster, then you might consider using the merge_taxa_vec() function I wrote (though this requires first converting your sequence table to a phyloseq otu_table object with otu_table(seqtab, taxa_are_rows = FALSE))

lplough commented 4 years ago

Ok. Sorry, I wasn't being clear. What Id like is to make the column names of the merged OTU table the sequences e.g.

TAAAGAGGAACG TAACGGGTTCG TACCCACTACTCC Sample1 10 0 12 Sample2 200 722 300 Sample3 3 400 100

or at least produce a file with the sequences in the order of the new clustered OTUs.

Since I will be doing my taxonomic assignment outside of DADA2 (after this clustering step), I need to know the sequence that corresponds with each 'new' OTU in the new merged table.

So, with my cluster mappings looking like this (originally 43 ASV sequences or OTUs and now 27 sequences/OTUs after clustering): > clusters cluster 1 18 2 4 3 20 4 25 5 3 6 2 7 1 8 5 9 13 10 1 ... 37 26 38 14 39 17 40 23 41 22 42 24 43 27

my question really is, how to map the 27 clustered sequences to the 27 OTUs in my new table? e.g. OTU 27 would be the sequence from the Original ASV sequence # 43, right? But then for OTU 1 (and others), which merged multiple sequences , #7, #10 (and others not shown), any of the collapsed/clustered sequences could be extracted for that OTU now. Is there a simple way to extract the 27 clustered sequences from the new merged table so I know what sequence goes with what clustered OTU?

Hopefully my question is clearer now?

lplough commented 4 years ago

Not too much code to create such a file. I was thinking of some basic code like this, using the ASV sequences and cluster to ASV mappings from clusters object that would reflect the order of OTUs in merged_seqtab (the output from my dummy example below with 27 clustered ASVs):

> merged_seqtab
    OTU1 OTU2 OTU3 OTU4 OTU5 OTU6 OTU7 OTU8 OTU9 OTU10 OTU11 OTU12 OTU13 OTU14
Ben  643 1115  452  298  810  176   94   17   15    34    29   139   195     8
Tom  643 1115  452  298  810  176   94   17   15    34    29   139   195     8
    OTU15 OTU16 OTU17 OTU18 OTU19 OTU20 OTU21 OTU22 OTU23 OTU24 OTU25 OTU26
Ben    56   128     4   718    11   670   129     3     4     3   468     9
Tom    56   128     4   718    11   670   129     3     4     3   468     9
    OTU27
Ben     2
Tom     2

(yes my two dummy samples, Ben and Tom, have identical sequence counts for each OTU - I just duplicated a sample for testing and creating the OTU table)

So, first make a new dataframe where ASV is column of all the original ASV sequences and C.ASV is the new column of clustered ASVs based on the mapping to those orig. ASVs.

library(dplyr)
newdf<-as.data.frame(cbind.data.frame(rownames(clusters),clusters$cluster,clusters$sequence))
colnames(newdf)<-c("ASV","C.ASV","sequence")

Then arrange by C.ASV and keep only the first of duplicated ASVs (there are 27 clustered ASVs in my dummy example from an original 43 DADA2 ASVs):

newdf.sort<-newdf %>% arrange(C.ASV) %>% distinct(C.ASV,.keep_all = TRUE)

Finally, create a fasta file of the new OTUs for taxonomic assignment, in thier proper order (to match the OTUs in merged_seqtab) :

fas<-paste(">OTU",seq(1:nrow(newdf.sort)),"\n",newdf.sort$sequence,sep="")
write.table(fas,file="out.fas", row.names=FALSE, col.names=FALSE,quote = FALSE)

user@ubuntu:~/DADA2_work$ head out.fas
>OTU1
TTTATCTGATTCTAAATTTCATTCAGGAATTTCTGTTGATTTAGCTATTTTTAGTTTACACATTGCTGGTATTTCTTCTATTTTAGGAAGGATTAATTTTATAACTACTATTATTTGTTCACGTACTACTAAGATAGTTAGCATAGATCGACTACCTTTAATGATTTGATCTTTAGGAGTTACAGCCTTTTTACTAGTTACTACTTTACCTGTTCTAGCTGGGGCAATTACTATACTATTAACTGATCGTAATTTCAATACATCTTTTTTCGATCCTTCAGGAGGAGGTAATCCTGTTCTATACCAACATTTATTT
>OTU2
TTTATCAAGAAACTTAGCACATGCAGGCCCTTCTGTAGATCTAGCTATTTTTTCTCTTCATTTAGCCGGTATTTCATCCATTCTAGGAGCACTAAATTTTATAACAACAGTAATTAATATACGCTCAAAAGGGCTTCGAATAGAATTAATACCTTTATTTATTTGAGCTTTATTTATTACAGCCATTCTTCTTCTCTTATCTTTACCCGTTTTAGCAGGAGGTATTACTATACTTTTAACAGACCGTAATCTAAACACAGCATTCTTCGATCCTGCAGGGGGAGGAGATCCTGTCTTGTTTCAGCATTTATTT
>OTU3
CCTGTCTAGAAATATTGCTCACGCAGGTAGGTCTGTCGATTTTGCCATTTTTTCACTTCACTTAGCAGGTGTTAGTTCTATTTTAGGGGCGGTAAACTTTATTAGAACTTTGGGTAACTTACGTGCATTCGGTATAATTTTAGATCGAATGCCTCTTTTTGCTTGGTCTGTGTTAATTACTGCAGTTCTTCTTCTTTTATCACTTCCTGTACTAGCAGGCGCTATTACTATATTATTAACAGATCGTAATCTAAATTCAAGATTTTATGACGCAAGTGGAGGGGGAGATCCGATTTTGTATCAACACCTGTTC

ARBramucci commented 3 years ago

Hi I am sorry, I am trying to follow along with this code and I keep getting an error when I am doing the following step: aln <- DECIPHER::AlignSeqs(dna, processors = nproc) produces the following error: 0%Error: cannot allocate vector of size 86929.0 Gb

But I have attempted several different CPU thresholds and it always is the same error, I have also checked my inputs and they seem correct, sample names and asv seq names are all correct.

Any help would be appreciated.

libraries loaded: library(tibble) library(dplyr) library(DECIPHER) library(Biostrings) require(Biostrings)

let me know if you need any other information thanks so much Anna

digitalwright commented 3 years ago

@ARBramucci Most likely there are too many ASVs to align. What is length(dna)? Are you only using the unique set (i.e., length(unique(dna))==length(dna))?

ARBramucci commented 3 years ago

Hi Thanks for the quick response! My dataset is fairly large, I have several sequencing plates.

length(dna) [1] 4830614 length(unique(dna))==length(dna) [1] TRUE

but that is before running the chimera check step, I will try running dada2 chimera removal now and see if that decreases the number of sequences to something more manageable for decipher align.

digitalwright commented 3 years ago

It is possible to align over 100k sequences using the defaults, but half a million is impractical. There are several options:

Use a chained guide tree. See the vignette.
Reduce the number of sequences with IdClusters(myXStringSet=dna, cutoff=0.01, method="inexact", processors=NULL) first.
Classify the sequences with IDTAXA and then only align a subset composed of representatives from each group.

gabrielet commented 2 years ago

am I correct in assuming that the IdClusters() function from the DECIPHER package changed a bit and now the pipeline provided above is not working anymore?

As far as I can tell, IdClusters() now requires two main parameters that are myXStringSet and cutoff. This means that passing a distance matrix (and the method parameter) as it was done in the pipeline above, it's not possible anymore and what we need to pass now is actually the dna variable as it was created in the post I linked:

# get ASV sequences
asv_sequences <- colnames(seqtab)
# cast them as DNAString
dna <- Biostrings::DNAStringSet(asv_sequences)

# run the clustering at 99% similarity
clusters <- DECIPHER::IdClusters(dna, cutoff = 0.01, processors=4)

# proceed as shown in this post above: https://github.com/benjjneb/dada2/issues/947#issuecomment-589237919
# sum-up ASVs according to the new clustering order
merged_seqtab <- seqtab %>% 
  t %>%
  rowsum(clusters$cluster) %>%
  t
# Optional renaming of clusters to OTU<cluster #>
colnames(merged_seqtab) <- paste0("OTU", colnames(merged_seqtab))

I also noticed that it is not possible to pass aligned sequences because, according to the error I am getting, the mismatch "-" symbol is not accepted by the function.

Is this correct or am I getting something wrong?

digitalwright commented 2 years ago

The IdClusters function underwent a lot of development since the original post above. The tree building functionality was split-off into the TreeLine function.

The original code:

clusters <- DECIPHER::IdClusters(
  d, 
  method = "complete",
  cutoff = 0.03, # use `cutoff = 0.03` for a 97% OTU 
  processors = nproc)

Is now:

clusters <- DECIPHER::TreeLine(
  myDistMatrix=d, 
  method = "complete",
  cutoff = 0.03, # use `cutoff = 0.03` for a 97% OTU 
  processors = nproc)

In the next release of DECIPHER (later this month), IdClusters no longer exists. Its original functionality is now part of the TreeLine and Clusterize functions.

I hope that helps.

gabrielet commented 2 years ago

@digitalwright, of course that helps. So, the pipeline still stands with the only exception that IdClusters will soon become TreeLine. thank you!

EDIT: I believe it's worth adding that, in order to get the expected output, as shown above, the function should take the parameter type set as type="clusters":

clusters <- DECIPHER::TreeLine(
  myDistMatrix=d,
  method = "complete",
  cutoff = 0.03, # use `cutoff = 0.03` for a 97% OTU
  type = "clusters",
  processors = nproc)

charles7007 commented 1 year ago

Hi @gabrielet , I am trying the above command, but I found below comment got an error.

run the clustering at 97% similarity

clusters <- DECIPHER::TreeLine( myDistMatrix=d, method = "complete", cutoff = 0.03, # use cutoff = 0.03 for a 97% OTU type = "clusters", processors = nproc)

It shown "Error in is(myDistMatrix, "matrix") : object 'd' not found"

gabrielet commented 1 year ago

@charles7007, well it looks like you don't have any object defined as d. maybe your distance matrix was created with a different name? can you provide some more info regarding what you did?

charles7007 commented 1 year ago

Hi @gabrielet, I am trying to generate OTU from the ASV data. I followed the command from DADA2 ITS Pipeline Workflow (1.8) (https://benjjneb.github.io/dada2/ITS_workflow.html) to get the ASV data. From the ASV data, I tried to generate OTU.

gabrielet commented 1 year ago

@charles7007 i am familiar with the pipeline. i believe the problem is downstream. So, you want to pass and object called d as a distance matrix to DECIPHER as you do here:

clusters <- DECIPHER::TreeLine(
  myDistMatrix=d,                                  <------------------------------- HERE
  method = "complete",
  cutoff = 0.03, # use `cutoff = 0.03` for a 97% OTU
  type = "clusters",
  processors = nproc)

but what is d? it looks like it's not a distance matrix. Actually, the error says there is no object called d. Where did you do something like d <- something or d = something??

julietaalvarez commented 1 year ago

I am having an issue with this part merged_seqtab <- seqtab.nochim %>% t %>% rowsum(clusters$cluster) %>% t I get this Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': incorrect length for 'group' and I already check all the objects, I don't understand what is wrong

benjjneb commented 1 year ago

@julietaalvarez Can you explain what it is you are trying to accomplish with your code?

julietaalvarez commented 1 year ago

@julietaalvarez Can you explain what it is you are trying to accomplish with your code?

This is part of the code suggested to cluster ASVs to OTUs, I perfectly ran this part of the code: ` aln <- DECIPHER::AlignSeqs(dna, processors = nproc)

d <- DECIPHER::DistanceMatrix(aln, processors = nproc)

clusters <- DECIPHER::TreeLine( myDistMatrix=d, method = "complete", cutoff = 0.03, # use cutoff = 0.03 for a 97% OTU processors = nproc) and I think the code that I am having error is to sum-up ASVs to the new clustering merged_seqtab <- seqtab.nochim %>% t %>% rowsum(clusters$cluster) %>% t `

but I am still having this error: "Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': incorrect length for 'group'" error_cluster

benjjneb commented 1 year ago

For troubleshooting purposes I would break apart these chains of piped commands, and do them one-at-a-time, saving to intermediate variables.

This will allow the exact command throwing the error to be identified (e.g. which call of t?), and will allow inspection of the input to that function in a way that should help elucidate what is causing the error.

benjjneb / dada2

Clustering ASVs after DADA2? generating new OTU table #947

Script

Agrupar ASVs en OTU

Next we merge_taxa_vec() to get the merged phyloseq object,

run the clustering at 97% similarity