BigelowLab / edna-dada2

Maine eDNA dada2
0 stars 0 forks source link

assign_taxonomy outputs #27

Closed btupper closed 2 years ago

btupper commented 2 years ago

https://github.com/BigelowLab/edna-dada2/blob/main/3step/asv_18S_postprocess.R#L148

What are we doing here? And why doesn't it go into ASV_taxa.csv?

robinsleith commented 2 years ago

Just ran through the updated scripts and I get output but its not pretty. Can we have it say that the final columns are bootstrap values?

ASV Kingdom...2 Supergroup...3 Division...4 Class...5 Order...6 Family...7 Genus...8 Species...9 Kingdom...10 Supergroup...11 Division...12 Class...13 Order...14 Family...15 Genus...16 Species...17
ASV_001 Eukaryota Opisthokonta Metazoa Arthropoda Crustacea Maxillopoda NA NA 100 100 100 100 100 100 45 45
ASV_002 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Polar-centric-Mediophyceae Cerataulina Cerataulina_pelagica 100 100 100 100 100 100 69 69
ASV_003 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Raphid-pennate Pseudo-nitzschia NA 100 100 100 100 100 100 99 38
ASV_004 Eukaryota NA NA NA NA NA NA NA 100 49 47 47 39 39 37 37
btupper commented 2 years ago

What in the world? What did you do to my nice software?

btupper commented 2 years ago

I'l have to see what your really doing, because this is what I get...

suppressPackageStartupMessages({
library(dada2)
library(dadautils)
library(dplyr)
library(readr)
})

seqs <- getSequences(system.file("extdata", "example_seqs.fa", package="dada2"))
training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- dadautils::assign_taxonomy(seqs, training_fasta, 
  outputBootstraps = TRUE,
  save_file = TRUE,
  filename  = "taxa.csv")

z = readr::read_csv("taxa.csv", show_col_types = FALSE)
glimpse(z)  
# Rows: 6                                                                                                            
# Columns: 15
# $ seq          <chr> "TTGAAGAAGAACACATCTGAGAGTAACTGTTCAGGTGTTGACGGTATTTAACCAGA…
# $ tax.Kingdom  <chr> "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria…
# $ tax.Phylum   <chr> "Firmicutes", "Cyanobacteria/Chloroplast", "Firmicutes", …
# $ tax.Class    <chr> "Bacilli", "Chloroplast", "Bacilli", NA, "Clostridia", NA
# $ tax.Order    <chr> "Lactobacillales", "Chloroplast", "Bacillales", NA, "Clos…
# $ tax.Family   <chr> "Lactobacillaceae", "Streptophyta", "Bacillaceae_2", NA, …
# $ tax.Genus    <chr> "Lactobacillus", NA, "Virgibacillus", NA, "Clostridium_se…
# $ tax.Species  <lgl> NA, NA, NA, NA, NA, NA
# $ boot.Kingdom <dbl> 100, 100, 100, 100, 100, 20
# $ boot.Phylum  <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Class   <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Order   <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Family  <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Genus   <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Species <lgl> NA, NA, NA, NA, NA, NA

Or am I missing the issue?

robinsleith commented 2 years ago

Ahhh, I see it now. The above works to output taxa.csv but the workflow as currently written takes the output and modifies it. So taxa.csv looks fine but ASV_taxa.csv does not.

  charlier::info("writing ASV_taxa")
  ttaxa <- dplyr::as_tibble(taxa$tax) %>%
    dplyr::mutate(ASV = names(fasta)) %>%
    dplyr::relocate(ASV, .before = 1) %>%
    dplyr::bind_cols(dplyr::as_tibble(taxa$boot)) %>%
    readr::write_csv(file.path(CFG$output_path, "ASV_taxa.csv"))
btupper commented 2 years ago

Oh, wait.

btupper commented 2 years ago

Right. That was my question... do we know what were are doiing here. Isn't taxa.csv the same as ASV_taxa.csv? Just with the ASV prepended? We don't need both I guess. Just the ASV_taxa.csv?

robinsleith commented 2 years ago

Yeah only difference is stripping out the ASV sequences from taxa.csv and replacing them with ASV_1, ASV_2 names. Dont need taxa.csv at the end of the day.

robinsleith commented 2 years ago

@btupper I can take a stab at modifying the ttaxa object but if you have time you would likely be faster!

btupper commented 2 years ago

Oh, sorry! Done! But I never pushed the example for https://github.com/BigelowLab/edna-dada2/blob/main/3step/asv_18S_postprocess.R#L130 Be sure to reinstall dadautils to get merge_taxonomy().

Pushed now. I didn't get to 16S or anything else... left as a homework assignment for the PhDs (I'm so funny!)

robinsleith commented 2 years ago

I did a fresh install of dadautils etc and followed these steps that gave me this error:

Error: Problem with `mutate()` column `ASV`.
ℹ `ASV = names(ASV)`.
✖ object 'ASV' not found
Backtrace:
     █
  1. ├─global::main(CFG)
  2. │ └─dadautils::merge_taxonomy(...)
  3. │   └─`%>%`(...)
  4. ├─dplyr::mutate(., ASV = names(ASV), .before = 1)
  5. ├─dplyr:::mutate.data.frame(., ASV = names(ASV), .before = 1)
  6. │ └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
  7. │   ├─base::withCallingHandlers(...)
  8. │   └─mask$eval_all_mutate(quo)
  9. └─base::.handleSimpleError(...)
 10.   └─dplyr:::h(simpleError(msg, call))
btupper commented 2 years ago

Argh! Someone doesn't know his fasta from his ASV. Try again?

robinsleith commented 2 years ago
Its wonderful! Thanks! ASV seq tax.Kingdom tax.Supergroup tax.Division tax.Class tax.Order tax.Family tax.Genus tax.Species boot.Kingdom boot.Supergroup boot.Division boot.Class boot.Order boot.Family boot.Genus boot.Species
ASV_001 CAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGGATTTCGGCGGGTATTGGTCGGTTTGAATTGCTTCAATACTGACTTTTTTACCCGTGTGTCGTGCCAGAATTTGTCGGGTGATCTTCGCCGATTGTCCGTCAGAACTGGCAGGTTTACTTTGAAAAAATTAGAGTGCTCAAAGCAAGCTTGATTGCTTGAATATTCGTGCATGGAATAATAGAATAGGAAGTCGTTTCTATTTTGTTGGTTTTCGGAAATCGACTTAATGATTAATAGGGATAGTCGGGGGCATTTGTATTCAAACGACAGAGGTGAAATTCTTGGACCGTTTGAAGACAAACTACTGCGAAAGCATTTGCCAAGAATGTTTTCATTAATCAAGAACGAAAGTTAGAGGTT Eukaryota Opisthokonta Metazoa Arthropoda Crustacea Maxillopoda NA NA 100 100 100 100 100 100 39 39
ASV_002 CAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAATTTCTGGTGGGAGTGTACAGTCCGACACTGAGTGTTGGTTCTTGTATGTCTCCGGCCATCCTTGGAGTGAATCTGTCTGGCATTCAGTTGTTGGGCAGAGTATCTCCATCGTTTACTGTGAGGAAATGAAAGTGTTCAAAGCAGGCTTATGCCGTTGAATCTTTTAGCATGGAATAATAAGATAGGATCTTGATACTATTTTGTTGGTTTGCGTGTTAAGATAATGATGAATAGGGACAGTTGGGGGTATTTGTATTCCATAGTCAGAGGTGAAATTCTTGGATTTATGGAAGACAAACTAATGCGAAAGCATCTACCAAGGATGTTTTCATTAATCAAGAACGAAAGTGTGGGGAT Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Polar-centric-Mediophyceae Cerataulina Cerataulina_pelagica 100 100 100 100 100 100 72 72
ASV_003 CAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTGTGGTGTGTCCAGTTGGCCTTTGCTCTTTGAGTGATTGCGCTGTACTGGTCTGCCATGTTTGGGTGGAATCTGTGTGGCATTAAGTTGTCGTGCAGGGGATGCCCATCGTTTACTGTGAAAAAATTAGAGTGTTCAAAGCAGGCTTATGCCGTTGAATATATTAGCATGGAATAATAATATAGGACCTTGGTACTATTTTGTTGGTTTGCGCACTAAGGTAATGATTAAGAGGGACAGTTGGGGGTATTTGTATTCCATTGTCAGAGGTGAAATTCTTGGATTTTTGGAAGACAAACTACTGCGAAAGCATTTACCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAGGGGAT Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Raphid-pennate Pseudo-nitzschia NA 100 100 100 100 100 100 99 45
ASV_004 CAATAGCGTATATTGAAGTTGTTGCGGTTAAAAAGCTCGTCGTTGGATTTCTGTTGAGGTCGAACAGTGCCGCCCTATGGGTGCGCAACTGGCTCGGCTTCGGCATCTTCTTGGAGAGCGCATCTGCACTTAACTGTGTGGATCGTACTTCAAGTCAGTTACTTTGAGGAATTGTGAGTGTTTCAAGCAGGCTTACGCCGTTGAATCCGTTAGCATGGAATAATAGAATAGGACCTCGGTTCTATTTTGTTGGTTTTGAGAGCTGGGGCAGCGATTGATAGGGACGGATGGGGGCGTCCGAACTTTACTGTCAGAGGTGAAATTCTTAGATCGGTCAAAGGCGAACTACTGCGAACGCATTCGCCAAGAATGTTTTCTTTAATCAAGAACGAAAGTTAGGGGAT Eukaryota NA NA NA NA NA NA NA 100 36 34 34 27 27 27 27