benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

could not find function "mergePairsByID" #301

Closed ezandi closed 7 years ago

ezandi commented 7 years ago

HI I am trying to run "mergePairsByID" in dada2, however I get the error message: could not find function "mergePairsByID". Dada2 is updated, library loaded, etc.. When I check the dada2 active functions:

ls("package:dada2") [1] "addSpecies" "assignSpecies" "assignTaxonomy"
[4] "collapseNoMismatch" "dada" "derepFastq"
[7] "errBalancedF" "errBalancedR" "errExtremeF"
[10] "errExtremeR" "errHmpF" "errHmpR"
[13] "fastqFilter" "fastqPairedFilter" "filterAndTrim"
[16] "getDadaOpt" "getErrors" "getSequences"
[19] "getUniques" "inflateErr" "isBimera"
[22] "isBimeraDenovo" "isBimeraDenovoTable" "isPhiX"
[25] "isShiftDenovo" "learnErrors" "loessErrfun"
[28] "makeSequenceTable" "mergePairs" "mergeSequenceTables"
[31] "nwalign" "nwhamming" "plotComplementarySubstitutions" [34] "plotErrors" "plotQualityProfile" "removeBimeraDenovo"
[37] "setDadaOpt" "tperr1" "uniquesToFasta" I cannot see the "mergePairsByID" function! I must be missing something! Does anyone know what could be wrong? Thanks a lot,

Ebi

spholmes commented 7 years ago

Could you update your installation of dada2 and show us what is returned when you type packageVersion("dada2")

Thanks Susan

On Mon, Jul 31, 2017 at 10:14 AM, ezandi notifications@github.com wrote:

HI I am trying to run "mergePairsByID" in dada2, however I get the error message: could not find function "mergePairsByID". Dada2 is updated, library loaded, etc.. When I check the dada2 active functions:

ls("package:dada2") [1] "addSpecies" "assignSpecies" "assignTaxonomy" [4] "collapseNoMismatch" "dada" "derepFastq" [7] "errBalancedF" "errBalancedR" "errExtremeF" [10] "errExtremeR" "errHmpF" "errHmpR" [13] "fastqFilter" "fastqPairedFilter" "filterAndTrim" [16] "getDadaOpt" "getErrors" "getSequences" [19] "getUniques" "inflateErr" "isBimera" [22] "isBimeraDenovo" "isBimeraDenovoTable" "isPhiX" [25] "isShiftDenovo" "learnErrors" "loessErrfun" [28] "makeSequenceTable" "mergePairs" "mergeSequenceTables" [31] "nwalign" "nwhamming" "plotComplementarySubstitutions" [34] "plotErrors" "plotQualityProfile" "removeBimeraDenovo" [37] "setDadaOpt" "tperr1" "uniquesToFasta" I cannot see the "mergePairsByID" function! I must be missing something! Does anyone know what could be wrong? Thanks a lot,

Ebi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/301, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvQ0E8txYFjTPmi5XQI7C3fMDWeWwks5sTgtmgaJpZM4OoqlX .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

ezandi commented 7 years ago

I did reinstall dada2. Updated to latest version! It is version 1.4.0

packageVersion("dada2") [1] ‘1.4.0’

spholmes commented 7 years ago

OK, I see it is missing, for the time being you should just sort and then use the mergePairs after having made sure your files/dereps are sorted first.

Ben: It may be this function didn't build for some conflicted reason??

On Mon, Jul 31, 2017 at 10:42 AM, ezandi notifications@github.com wrote:

I did reinstall dada2. Updated to latest version! It is version 1.4.0

packageVersion("dada2") [1] ‘1.4.0’

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/301#issuecomment-319141728, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvdNKDpRJcMblOCBNkbuygbCKS065ks5sThIOgaJpZM4OoqlX .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

benjjneb commented 7 years ago

I believe mergePairsByID is not currently exported.

You can still call it with the three-colon prefix, i.e. dada2:::mergePairsByID(...)

joey711 commented 7 years ago

I don't know why @benjjneb has not exported it. I documented it and everything! :-) By the way, the dada2:::mergePairsByID(...) works. It's even being used by the Shiny protototype...

@ezandi this should resolve your issue. Please post back if you have further issues about that version of the function call.

Cheers

ezandi commented 7 years ago

Thanks Ben. dada2:::mergePairsByID solved the export. How ever, I am getting this error message now: Error in dada_to_seq_table(dadaF, derepF, srF, idRegExpr = idRegExpr, : derep argument must be of the derep class. See ?'derep-class' for more details.

I have prepared every file for the merge in dada2!

Thanks,

Ebi

On Jul 31, 2017, at 12:45 PM, Benjamin Callahan notifications@github.com<mailto:notifications@github.com> wrote:

I believe mergePairsByID is not currently exported.

You can still call it with the three-colon prefix, i.e. dada2:::mergePairsByID(...)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_issues_301-23issuecomment-2D319175423&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=cOAzMWaC3cFaVoGSViI2dphDmEAXGdChbbfE2Ljuzdk&s=i5ecVfrllMaWHf2d2L6rotncBO4v90ZWxqJ64YkcYZ4&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQORU1qI6G0j9K5SUugj9NouTl760Jufks5sTi7bgaJpZM4OoqlX&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=cOAzMWaC3cFaVoGSViI2dphDmEAXGdChbbfE2Ljuzdk&s=dRfl3X0MswEcrp3NciWTwgX_-98dZxXH2V-lKze-H7M&e=.

benjjneb commented 7 years ago

Could you post the exact command you ran to get that error? The error message indicates that the second object you provided was not a derep-class object. What is the output of class(secondObject)?

ezandi commented 7 years ago

Here is the exact command:

mergerFRs<- dada2:::mergePairsByID(dadaFs, derepFs, filtFs, dadaRs, derepRs, filtRs, minOverlap = 20, maxMismatch = 0, returnRejects = FALSE, idRegExpr = c("\s.+$", ""), includeCol = character(0), justConcatenate = FALSE, verbose = FALSE) Error in dada_to_seq_table(dadaF, derepF, srF, idRegExpr = idRegExpr, : derep argument must be of the derep class. See ?'derep-class' for more details.

Ebi On Aug 1, 2017, at 5:38 PM, Benjamin Callahan notifications@github.com<mailto:notifications@github.com> wrote:

Could you post the exact command you ran to get that error? The error message indicates that the second object you provided was not a derep-class object. What is the output of class(secondObject)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_issues_301-23issuecomment-2D319535244&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=72XQ_ykj1aqUCfmPnBo05kQzK0sISMaIaGL-3SXpwH0&s=gznvZZjYOGfE9vfOgXSGcwvNJSxHDfq1uK7l852Zc0A&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQORU0WGn2zDD9I-5F1exl6HfTnfLe6MQqks5sT8UXgaJpZM4OoqlX&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=72XQ_ykj1aqUCfmPnBo05kQzK0sISMaIaGL-3SXpwH0&s=HtyUUZj3rquSOgLBY1w5aDwDv5aIY3QPWkU0lgBh8f0&e=.

benjjneb commented 7 years ago

Any thought @joey711 ?

Is mergePairsByID R-vectorized?

joey711 commented 7 years ago

They still haven't verified that both derep arguments are actually derep class...

benjjneb commented 7 years ago

@ezandi Can you tell us the output of class(derepFs) and class(derepRs)?

Also, you may need to embed this command in a loop. Can you try the following code as a replacement, and tell us if it works?

mergerFRs <- vector("list", length(dadaFs))
for(i in seq_along(dadaFs)) {
  mergerFRs[[i]] <- dada2:::mergePairsByID(dadaFs[[i]], derepFs[[i]], filtFs[[i]],
                                   dadaRs[[i]], derepRs[[i]], filtRs[[i]], 
                                   minOverlap = 20, maxMismatch = 0, 
                                   idRegExpr = c("\\s.+$", ""), verbose=TRUE)
}
ezandi commented 7 years ago

Here it is. First the issue is very reproducible, and the derep classes are "list" (see below)

dadaFs <- dada(derepFs, err=errF, multithread=TRUE) Sample 1 - 391741 reads in 42997 unique sequences. Sample 2 - 399877 reads in 43626 unique sequences.

dadaRs <- dada(derepRs, err=errR, multithread=TRUE) Sample 1 - 355952 reads in 49763 unique sequences. Sample 2 - 360566 reads in 52803 unique sequences.

class(derepFs) [1] "list" class(derepRs) [1] "list" mergerFRs<- dada2:::mergePairsByID(dadaFs, derepFs, filtFs, dadaRs, derepRs, filtRs, minOverlap = 20, maxMismatch = 0, returnRejects = FALSE, idRegExpr = c("\s.+$", ""), includeCol = character(0), justConcatenate = FALSE, verbose = FALSE) Error in dada_to_seq_table(dadaF, derepF, srF, idRegExpr = idRegExpr, : derep argument must be of the derep class. See ?'derep-class' for more details.

Here is the code you suggeste dto run:

class(derepFs) [1] "list" class(derepRs) [1] "list" mergerFRs <- vector("list", length(dadaFs)) for(i in seq_along(dadaFs)) {

  • mergerFRs[[i]] <- dada2:::mergePairsByID(dadaFs[[i]], derepFs[[i]], filtFs[[i]],
  • dadaRs[[i]], derepRs[[i]], filtRs[[i]],
  • minOverlap = 20, maxMismatch = 0,
  • idRegExpr = c("\s.+$", ""), verbose=TRUE)
  • } srF interpreted as path to forward reads fastq file. Attempting to read... srR interpreted as path to forward reads fastq file. Attempting to read... 391741 unique forward read IDs. 355952 unique reverse read IDs. 353253 paired reads, corresponding to 7566 unique pairs that must be assessed for overlap merge 2 paired-reads (in 1 unique pairings) successfully merged from 353253 read pairs. srF interpreted as path to forward reads fastq file. Attempting to read... srR interpreted as path to forward reads fastq file. Attempting to read... 399877 unique forward read IDs. 360566 unique reverse read IDs. 358009 paired reads, corresponding to 7569 unique pairs that must be assessed for overlap merge Error in C_pair_consensus(als1, als2, prefer, FALSE) : expecting a single value
benjjneb commented 7 years ago

OK, so we have solved the syntax problem: mergePairsByID only accepts a single sample at a time, so it needs to be called within a loop instead of being passed multi-sample objects like derepFs.

But a second problem now crops up, very few if any reads are being successfully merged...

Could you post the output of the following?

head(dadaFs[[1]]$sequence)
head(dadaRs[[1]]$sequence)
ezandi commented 7 years ago

head(dadaFs[[1]]$sequence) [1] "GTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTTGAGTGCAGTTGAGGCAGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCC" [2] "GTTGGAAGTGAAATCTATGGGCTCAACCCATAAACTGCTTTCAAAACTGTTTTTCTTGAGTAGTGCAGAGGTAGGCGGAATTCCCGGTGTAGCGGTGGAATGCGTAGATATCGGGAGGAACACCAGTGGCGAAGGCGGCC" [3] "GTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTTGAGTACAGTAGAGGTGGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCT" [4] "GTTGGAAGTGAAATCCATGGGCTCAACCCATGAACTGCTTTCAAAACTGTTTTTCTTGAGTAGTGCAGAGGTAGGCGGAATTCCCGGTGTAGCGGTGGAATGCGTAGATATCGGGAGGAACACCAGTGGCGAAGGCGGCC" [5] "GTCAGAAGTGAAAGCCCGGGGCTCAACCCCGGGACTGCTTTTGAAACTGTGGAGCTGGAGTGCAGGAGAGGCAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCT" [6] "GTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAATTGATACTGGCAGTCTTGAGTACAGTTGAGGTAGGCGGAATTCGTGGTGTAGCGGTGAAATGCTTAGATATCACGAAGAACTCCGATTGCGAAGGCAGCT" head(dadaRs[[1]]$sequence) [1] "TACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGAAGGATGACTGCCCTATGGGTTGTAAACTTCTTTTATAAAGGAATAAAGTCGGGTATGGATACCCGTTTGCATGTACTT" [2] "TACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGGAGGAAGAAGGTCTTCGGATTGTAAACTCCTGTTGTTGGGGAAGATAATGACGGTACCCAACAAGGAAGTGACGGCTAA" [3] "TACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCGACGCCGCGTGGAGGAAGAAGGTCTTCGGATTGTAAACTCCTGTTGTTGAGGAAGATAATGACGGTACTCAACAAGGAAGTGACGGCTAA" [4] "TACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGAAGGATGACTGCCCTATGGGTTGTAAACTTCTTTTATAAAGGAATAAAGTCGGGTATGTATACCCGTTTGCATGTACTT" [5] "TACGGGGGGCAGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGAAGGATGACTGCCCTATGGGTTGTAAACTTCTTTTATAAAGGAATAAAGTCGGGTATGGATACCCGTTTGCATGTACTT" [6] "TACGGGTGGCAGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGAAGGATGACTGCCCTATGGGTTGTAAACTTCTTTTATAAAGGAATAAAGTCGGGTATGGATACCCGTTTGCATGTACTT"

joey711 commented 7 years ago

don't you also need the IDs so you know which sequences to merge together?

ezandi commented 7 years ago

This is also my question! I prepared all the files following the exact steps of dada2!

benjjneb commented 7 years ago

When I BLAST the most common F and R reads, they are exact matches to the same organisms (a bunch of B. vulgatus). However, they don't align to each other. This is probably because your reads aren't overlapping anymore after trimming/truncation, and thus merging (by any method) will fail.

@ezandi Could you clarify what primer set you are using here?

ezandi commented 7 years ago

Thanks Ben, I got the same results. I am analyzing for ward and reverse reads separately. This was the first microbiome sequencing the core did here. They sequence the V3 region, but have sent me the primer sequences yet! I had a a lot of issues with the data they sent me. Thanks a lot for your help.

Best,

Ebi On Aug 4, 2017, at 6:41 AM, Benjamin Callahan notifications@github.com<mailto:notifications@github.com> wrote:

When I BLAST the most common F and R reads, they are exact matches to the same organisms (a bunch of B. vulgatus). However, they don't align to each other. This is probably because your reads are aren't overlapping anymore after trimming/truncation, and thus merging (by any method) will fail.

@ezandihttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ezandi&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=3h2CRCMQ_mZXddlqt-t3gdK8SbjDExsrMYCGnGZnjIE&s=oH7Y3_ruRUfPgipOtiHuUyRtN62XO2VeidatOKKa6-Y&e= Could you clarify what primer set you are using here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_issues_301-23issuecomment-2D320251818&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=3h2CRCMQ_mZXddlqt-t3gdK8SbjDExsrMYCGnGZnjIE&s=eo4HpFyAAmjWFgmHtNGYlEA5eS8G61NvgVll7Qwst00&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQORU3ABJTdYaNeg9rhC5e9l0N-5FYWbA0ks5sUx91gaJpZM4OoqlX&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=3h2CRCMQ_mZXddlqt-t3gdK8SbjDExsrMYCGnGZnjIE&s=YnNpHYv7jDRPJ3LnWr5Yv-RiWa4GQ6wnh8swWObmC_M&e=.

Ebrahim Zandi, Ph.D. Associate Professor of Molecular Microbiology and Immunology Faculty Director of Proteomics Core at USC University of Southern California Norris Cancer Comprehensive Cancer Center, Keck School of Medicine NOR 6429, Mail Stop # 9176 1441 Eastlake Ave. Los Angeles, CA 90089-0112 Phone: 323 865 0644 Email: zandi@usc.edumailto:zandi@usc.edu