benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

DADA2-formatted reference database for SILVA 138.2 #1983

Open nr0cinu opened 2 months ago

nr0cinu commented 2 months ago

Hi,

SILVA 138.2 is out.

Will there be an updated DADA2-formatted reference database made for it?

It would be greatly appreciated.

Thank you! Bela

benjjneb commented 2 months ago

Thanks for bringing this up. Yes we will release a DADA2-formatted version, but haven't yet. I'll review the changes and see to what extent we need to update our formatting processing script.

taiwan-user commented 1 month ago

Hi,

makeTaxonomyFasta_SilvaNR ran for me, but makeSpeciesFasta_Silva had errors

makeTaxonomyFasta_SilvaNR

library(dada2)
packageVersion("dada2")
path <- “/path/to/silva-138.2/”
dada2:::makeTaxonomyFasta_SilvaNR(fin = file.path(path, "SILVA_138.2_SSURef_NR99_tax_silva.fasta.gz"),
                                  ftax = file.path(path, "tax_slv_ssu_138.2.txt"),
                                  fout = file.path(path, "silva_nr99_v138.2_train_set.fa.gz"),
                                  compress = TRUE)
dada2:::makeTaxonomyFasta_SilvaNR(fin = file.path(path, "SILVA_138.2_SSURef_NR99_tax_silva.fasta.gz"), 
                                  ftax = file.path(path, "tax_slv_ssu_138.2.txt"), 
                                  include.species = TRUE, 
                                  fout = file.path(path, "silva_nr99_v138.2_wSpecies_train_set.fa.gz"),
                                  compress = TRUE)

output

[1] ‘1.28.0’
451655 reference sequences were output.

  Archaea  Bacteria Eukaryota 
    20389    431166       100 
451655 reference sequences were output.

  Archaea  Bacteria Eukaryota 
    20389    431166       100 
106844 entries include species names.

makeSpeciesFasta_Silva

dada2:::makeSpeciesFasta_Silva(fin = file.path(path, "SILVA_138.2_SSURef_tax_silva.fasta.gz"),
                               fout = file.path(path, "silva_species_assignment_v138.2.fa.gz"),
                               compress = TRUE)

output

Warning in grepl(paste0("^", gen.binom, "[ _", split.glyph, "]"), gen.tax) :
  TRE pattern compilation error 'Missing ')''
Error in grepl(paste0("^", gen.binom, "[ _", split.glyph, "]"), gen.tax) : 
  invalid regular expression '^(Citrus[ _-]', reason 'Missing ')''
taiwan-user commented 1 month ago

I haven't been able to completely recover all of the names as in 138.1 in the 138.2 data, but I have found some lines in the 138.1 dada2 formatted file that may not have been removed as intended, such as those containing " endosymbiont" " symbiont" or " bacterium" in case that is of any help.

Thank you!

freixas84 commented 2 weeks ago

Has there been any updates on 138.2 for dada2?

benjjneb commented 2 weeks ago

Not yet. 😢