benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

R crashes when trying to create phylotree using dada2! #294

Closed ezandi closed 7 years ago

ezandi commented 7 years ago

I am trying to use msa package to create tree as described in Bioconductor workflow for microbiome data analysis: Ben J. Callahan , Kris Sankaran , Julia A. Fukuyama , Paul J. McMurdie , Susan P. Holmes1. I have processed my 24 16S sequences according to dada2 single-end sequence workflow and every thing works fine until I try to generate the phylogenetic tree. At this point R crashes and every thing freezes! I have about 300,000 amplicons (220 seqs) from a 2x300 Illumina sequencing per sample. I am not sure what the problem is. I am doing this on a macos sierra!

spholmes commented 7 years ago

The tree can take a very long time, we recommend starting by looking at the NJ tree and checking that everything works with that, if necessary for a large tree we recommend RaxML but you need to install it separately, how many distinct ASV/RSV's are you trying to build a tree for? Susan

On Tue, Jul 25, 2017 at 10:38 AM, ezandi notifications@github.com wrote:

I am trying to use msa package to create tree as described in Bioconductor workflow for microbiome data analysis: Ben J. Callahan , Kris Sankaran , Julia A. Fukuyama , Paul J. McMurdie , Susan P. Holmes1. I have processed my 24 16S sequences according to dada2 single-end sequence workflow and every thing works fine until I try to generate the phylogenetic tree. At this point R crashes and every thing freezes! I have about 300,000 amplicons (220 seqs) from a 2x300 Illumina sequencing per sample. I am not sure what the problem is. I am doing this on a macos sierra!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/294, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvY2QApFkbrmzLEUz1JGgmYrhEsF_ks5sRigXgaJpZM4Oi4hx .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

ezandi commented 7 years ago

I have 7508 taxa and 24 samples. Is there any other way that I can make a phylotree from my data for now so I can do some simple data analysis? I am new in this field.

Best,

Ebi On Jul 25, 2017, at 11:04 AM, Susan Holmes notifications@github.com<mailto:notifications@github.com> wrote:

The tree can take a very long time, we recommend starting by looking at the NJ tree and checking that everything works with that, if necessary for a large tree we recommend RaxML but you need to install it separately, how many distinct ASV/RSV's are you trying to build a tree for? Susan

On Tue, Jul 25, 2017 at 10:38 AM, ezandi notifications@github.com<mailto:notifications@github.com> wrote:

I am trying to use msa package to create tree as described in Bioconductor workflow for microbiome data analysis: Ben J. Callahan , Kris Sankaran , Julia A. Fukuyama , Paul J. McMurdie , Susan P. Holmes1. I have processed my 24 16S sequences according to dada2 single-end sequence workflow and every thing works fine until I try to generate the phylogenetic tree. At this point R crashes and every thing freezes! I have about 300,000 amplicons (220 seqs) from a 2x300 Illumina sequencing per sample. I am not sure what the problem is. I am doing this on a macos sierra!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/294, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvY2QApFkbrmzLEUz1JGgmYrhEsF_ks5sRigXgaJpZM4Oi4hx .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_issues_294-23issuecomment-2D317819157&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=GRJgUnSK2Qo7-gOzf8Y-ReTdqF_rtSqUASoe_oh2_WE&s=iRUYiOI-NgPrlKZHabv741JlOSaI65gPMFixqSc392I&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQORU4aEiEabVNkPa-2DoqGSK-5FSlnem945ks5sRi4wgaJpZM4Oi4hx&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=GRJgUnSK2Qo7-gOzf8Y-ReTdqF_rtSqUASoe_oh2_WE&s=MiLWcOaLFOStA4yDfUGFQ03HZlKszDNgo0EHBs_a8E4&e=.

Ebrahim Zandi, Ph.D. Associate Professor of Molecular Microbiology and Immunology Faculty Director of Proteomics Core at USC University of Southern California Norris Cancer Comprehensive Cancer Center, Keck School of Medicine NOR 6429, Mail Stop # 9176 1441 Eastlake Ave. Los Angeles, CA 90089-0112 Phone: 323 865 0644 Email: zandi@usc.edumailto:zandi@usc.edu

spholmes commented 7 years ago

If nj on its own didnot work for you (did you try it?)

Then you have to install raxml to do anything tree based https://sco.h-its.org/exelixis/web/software/raxml/hands_on.html you build a tree outside of r keep it in newick format and read it into phyloseq.

many methods available (ordinations on Bray Curtis etc...) do not require a tree and you don't need to have one to have a valid phyloseq object.

Best Susan

On Tue, Jul 25, 2017 at 11:38 AM, ezandi notifications@github.com wrote:

I have 7508 taxa and 24 samples. Is there any other way that I can make a phylotree from my data for now so I can do some simple data analysis? I am new in this field.

Best,

Ebi On Jul 25, 2017, at 11:04 AM, Susan Holmes <notifications@github.com< mailto:notifications@github.com>> wrote:

The tree can take a very long time, we recommend starting by looking at the NJ tree and checking that everything works with that, if necessary for a large tree we recommend RaxML but you need to install it separately, how many distinct ASV/RSV's are you trying to build a tree for? Susan

On Tue, Jul 25, 2017 at 10:38 AM, ezandi <notifications@github.com<mailto: notifications@github.com>> wrote:

I am trying to use msa package to create tree as described in Bioconductor workflow for microbiome data analysis: Ben J. Callahan , Kris Sankaran , Julia A. Fukuyama , Paul J. McMurdie , Susan P. Holmes1. I have processed my 24 16S sequences according to dada2 single-end sequence workflow and every thing works fine until I try to generate the phylogenetic tree. At this point R crashes and every thing freezes! I have about 300,000 amplicons (220 seqs) from a 2x300 Illumina sequencing per sample. I am not sure what the problem is. I am doing this on a macos sierra!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/294, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABJcvY2QApFkbrmzLEUz1JGgmYrhEsF_ks5sRigXgaJpZM4Oi4hx .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense. proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_ issues_294-23issuecomment-2D317819157&d=DwMFaQ&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m= GRJgUnSK2Qo7-gOzf8Y-ReTdqF_rtSqUASoe_oh2_WE&s=iRUYiOI- NgPrlKZHabv741JlOSaI65gPMFixqSc392I&e=, or mute the thread< https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_notifications_unsubscribe-2Dauth_AQORU4aEiEabVNkPa-2DoqGSK- 5FSlnem945ks5sRi4wgaJpZM4Oi4hx&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN 0H8p7CSfnc_gI&r=lVYsnUCr4SyPI40sec7Rw&m=GRJgUnSK2Qo7-gOzf8Y-ReTdqF rtSqUASoe_oh2_WE&s=MiLWcOaLFOStA4yDfUGFQ03HZlKszDNgo0EHBs_a8E4&e=>.

Ebrahim Zandi, Ph.D. Associate Professor of Molecular Microbiology and Immunology Faculty Director of Proteomics Core at USC University of Southern California Norris Cancer Comprehensive Cancer Center, Keck School of Medicine NOR 6429, Mail Stop # 9176 1441 Eastlake Ave. Los Angeles, CA 90089-0112 Phone: 323 865 0644 <(323)%20865-0644> Email: zandi@usc.edumailto:zandi@usc.edu

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/294#issuecomment-317830053, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvfIGwQ0de2g0Imwug1fv_ki4w0vLks5sRjYHgaJpZM4Oi4hx .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

ezandi commented 7 years ago

I did not try nj (don’t know how to do it!). But I made a fasta sequence file from the seqtab, combined them and are using msa in micca to make the alignment and the tree. So far, it seems to be working. It is going to take a while. Also, I have made the phyloseq without the tree, but want to add the tree file to it to do some analysis.

Thank you so much for your help,

Best,

Ebi On Jul 25, 2017, at 2:06 PM, Susan Holmes notifications@github.com<mailto:notifications@github.com> wrote:

If nj on its own didnot work for you (did you try it?)

Then you have to install raxml to do anything tree based https://sco.h-its.org/exelixis/web/software/raxml/hands_on.html you build a tree outside of r keep it in newick format and read it into phyloseq.

many methods available (ordinations on Bray Curtis etc...) do not require a tree and you don't need to have one to have a valid phyloseq object.

Best Susan

On Tue, Jul 25, 2017 at 11:38 AM, ezandi notifications@github.com<mailto:notifications@github.com> wrote:

I have 7508 taxa and 24 samples. Is there any other way that I can make a phylotree from my data for now so I can do some simple data analysis? I am new in this field.

Best,

Ebi On Jul 25, 2017, at 11:04 AM, Susan Holmes notifications@github.com<mailto:notifications@github.com< mailto:notifications@github.com>> wrote:

The tree can take a very long time, we recommend starting by looking at the NJ tree and checking that everything works with that, if necessary for a large tree we recommend RaxML but you need to install it separately, how many distinct ASV/RSV's are you trying to build a tree for? Susan

On Tue, Jul 25, 2017 at 10:38 AM, ezandi notifications@github.com<mailto:notifications@github.com<mailto: notifications@github.commailto:notifications@github.com>> wrote:

I am trying to use msa package to create tree as described in Bioconductor workflow for microbiome data analysis: Ben J. Callahan , Kris Sankaran , Julia A. Fukuyama , Paul J. McMurdie , Susan P. Holmes1. I have processed my 24 16S sequences according to dada2 single-end sequence workflow and every thing works fine until I try to generate the phylogenetic tree. At this point R crashes and every thing freezes! I have about 300,000 amplicons (220 seqs) from a 2x300 Illumina sequencing per sample. I am not sure what the problem is. I am doing this on a macos sierra!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/294, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABJcvY2QApFkbrmzLEUz1JGgmYrhEsF_ks5sRigXgaJpZM4Oi4hx .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense. proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_<http://proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_ issues_294-23issuecomment-2D317819157&d=DwMFaQ&c= clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m= GRJgUnSK2Qo7-gOzf8Y-ReTdqF_rtSqUASoe_oh2_WE&s=iRUYiOI- NgPrlKZHabv741JlOSaI65gPMFixqSc392I&e=>, or mute the thread< https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_notifications_unsubscribe-2Dauth_AQORU4aEiEabVNkPa-2DoqGSK- 5FSlnem945ks5sRi4wgaJpZM4Oi4hx&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN 0H8p7CSfnc_gI&r=lVYsnUCr4SyPI40sec7Rw&m=GRJgUnSK2Qo7-gOzf8Y-ReTdqF rtSqUASoe_oh2_WE&s=MiLWcOaLFOStA4yDfUGFQ03HZlKszDNgo0EHBs_a8E4&e=>.

Ebrahim Zandi, Ph.D. Associate Professor of Molecular Microbiology and Immunology Faculty Director of Proteomics Core at USC University of Southern California Norris Cancer Comprehensive Cancer Center, Keck School of Medicine NOR 6429, Mail Stop # 9176 1441 Eastlake Ave. Los Angeles, CA 90089-0112 Phone: 323 865 0644 <(323)%20865-0644> Email: zandi@usc.edumailto:zandi@usc.edumailto:zandi@usc.edu

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/294#issuecomment-317830053, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvfIGwQ0de2g0Imwug1fv_ki4w0vLks5sRjYHgaJpZM4Oi4hx .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_benjjneb_dada2_issues_294-23issuecomment-2D317872282&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=yWW_xMGNrmbNWnNyhrbQsX_uPb3g57Opeh1akVnqLKU&s=uE-UuRMT6q9BrP2M-kElHB7C9VKPGXw4-qSOyMnd-C8&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQORUyApQ9-5FwyZIVTK-2DoEAzunFf9wCMuks5sRli-5FgaJpZM4Oi4hx&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=l_VYsnUCr4SyPI40sec7Rw&m=yWW_xMGNrmbNWnNyhrbQsX_uPb3g57Opeh1akVnqLKU&s=C8MZ0F6vifcNDU263LuLOPkdXbOSCW7g1-u-UE-yye8&e=.

Ebrahim Zandi, Ph.D. Associate Professor of Molecular Microbiology and Immunology Faculty Director of Proteomics Core at USC University of Southern California Norris Cancer Comprehensive Cancer Center, Keck School of Medicine NOR 6429, Mail Stop # 9176 1441 Eastlake Ave. Los Angeles, CA 90089-0112 Phone: 323 865 0644 Email: zandi@usc.edumailto:zandi@usc.edu