Closed btupper closed 2 years ago
Just ran through the updated scripts and I get output but its not pretty. Can we have it say that the final columns are bootstrap values?
ASV | Kingdom...2 | Supergroup...3 | Division...4 | Class...5 | Order...6 | Family...7 | Genus...8 | Species...9 | Kingdom...10 | Supergroup...11 | Division...12 | Class...13 | Order...14 | Family...15 | Genus...16 | Species...17 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASV_001 | Eukaryota | Opisthokonta | Metazoa | Arthropoda | Crustacea | Maxillopoda | NA | NA | 100 | 100 | 100 | 100 | 100 | 100 | 45 | 45 |
ASV_002 | Eukaryota | Stramenopiles | Ochrophyta | Bacillariophyta | Bacillariophyta_X | Polar-centric-Mediophyceae | Cerataulina | Cerataulina_pelagica | 100 | 100 | 100 | 100 | 100 | 100 | 69 | 69 |
ASV_003 | Eukaryota | Stramenopiles | Ochrophyta | Bacillariophyta | Bacillariophyta_X | Raphid-pennate | Pseudo-nitzschia | NA | 100 | 100 | 100 | 100 | 100 | 100 | 99 | 38 |
ASV_004 | Eukaryota | NA | NA | NA | NA | NA | NA | NA | 100 | 49 | 47 | 47 | 39 | 39 | 37 | 37 |
What in the world? What did you do to my nice software?
I'l have to see what your really doing, because this is what I get...
suppressPackageStartupMessages({
library(dada2)
library(dadautils)
library(dplyr)
library(readr)
})
seqs <- getSequences(system.file("extdata", "example_seqs.fa", package="dada2"))
training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- dadautils::assign_taxonomy(seqs, training_fasta,
outputBootstraps = TRUE,
save_file = TRUE,
filename = "taxa.csv")
z = readr::read_csv("taxa.csv", show_col_types = FALSE)
glimpse(z)
# Rows: 6
# Columns: 15
# $ seq <chr> "TTGAAGAAGAACACATCTGAGAGTAACTGTTCAGGTGTTGACGGTATTTAACCAGA…
# $ tax.Kingdom <chr> "Bacteria", "Bacteria", "Bacteria", "Bacteria", "Bacteria…
# $ tax.Phylum <chr> "Firmicutes", "Cyanobacteria/Chloroplast", "Firmicutes", …
# $ tax.Class <chr> "Bacilli", "Chloroplast", "Bacilli", NA, "Clostridia", NA
# $ tax.Order <chr> "Lactobacillales", "Chloroplast", "Bacillales", NA, "Clos…
# $ tax.Family <chr> "Lactobacillaceae", "Streptophyta", "Bacillaceae_2", NA, …
# $ tax.Genus <chr> "Lactobacillus", NA, "Virgibacillus", NA, "Clostridium_se…
# $ tax.Species <lgl> NA, NA, NA, NA, NA, NA
# $ boot.Kingdom <dbl> 100, 100, 100, 100, 100, 20
# $ boot.Phylum <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Class <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Order <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Family <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Genus <dbl> 100, 100, 100, 100, 100, 18
# $ boot.Species <lgl> NA, NA, NA, NA, NA, NA
Or am I missing the issue?
Ahhh, I see it now. The above works to output taxa.csv but the workflow as currently written takes the output and modifies it. So taxa.csv looks fine but ASV_taxa.csv does not.
charlier::info("writing ASV_taxa")
ttaxa <- dplyr::as_tibble(taxa$tax) %>%
dplyr::mutate(ASV = names(fasta)) %>%
dplyr::relocate(ASV, .before = 1) %>%
dplyr::bind_cols(dplyr::as_tibble(taxa$boot)) %>%
readr::write_csv(file.path(CFG$output_path, "ASV_taxa.csv"))
Oh, wait.
Right. That was my question... do we know what were are doiing here. Isn't taxa.csv the same as ASV_taxa.csv? Just with the ASV prepended? We don't need both I guess. Just the ASV_taxa.csv?
Yeah only difference is stripping out the ASV sequences from taxa.csv and replacing them with ASV_1, ASV_2 names. Dont need taxa.csv at the end of the day.
@btupper I can take a stab at modifying the ttaxa object but if you have time you would likely be faster!
Oh, sorry! Done! But I never pushed the example for https://github.com/BigelowLab/edna-dada2/blob/main/3step/asv_18S_postprocess.R#L130 Be sure to reinstall dadautils
to get merge_taxonomy()
.
Pushed now. I didn't get to 16S or anything else... left as a homework assignment for the PhDs (I'm so funny!)
I did a fresh install of dadautils etc and followed these steps that gave me this error:
Error: Problem with `mutate()` column `ASV`.
ℹ `ASV = names(ASV)`.
✖ object 'ASV' not found
Backtrace:
█
1. ├─global::main(CFG)
2. │ └─dadautils::merge_taxonomy(...)
3. │ └─`%>%`(...)
4. ├─dplyr::mutate(., ASV = names(ASV), .before = 1)
5. ├─dplyr:::mutate.data.frame(., ASV = names(ASV), .before = 1)
6. │ └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
7. │ ├─base::withCallingHandlers(...)
8. │ └─mask$eval_all_mutate(quo)
9. └─base::.handleSimpleError(...)
10. └─dplyr:::h(simpleError(msg, call))
Argh! Someone doesn't know his fasta from his ASV. Try again?
Its wonderful! Thanks! ASV | seq | tax.Kingdom | tax.Supergroup | tax.Division | tax.Class | tax.Order | tax.Family | tax.Genus | tax.Species | boot.Kingdom | boot.Supergroup | boot.Division | boot.Class | boot.Order | boot.Family | boot.Genus | boot.Species |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASV_001 | CAATAGCGTATATTAAAGTTGTTGTGGTTAAAAAGCTCGTAGTTGGATTTCGGCGGGTATTGGTCGGTTTGAATTGCTTCAATACTGACTTTTTTACCCGTGTGTCGTGCCAGAATTTGTCGGGTGATCTTCGCCGATTGTCCGTCAGAACTGGCAGGTTTACTTTGAAAAAATTAGAGTGCTCAAAGCAAGCTTGATTGCTTGAATATTCGTGCATGGAATAATAGAATAGGAAGTCGTTTCTATTTTGTTGGTTTTCGGAAATCGACTTAATGATTAATAGGGATAGTCGGGGGCATTTGTATTCAAACGACAGAGGTGAAATTCTTGGACCGTTTGAAGACAAACTACTGCGAAAGCATTTGCCAAGAATGTTTTCATTAATCAAGAACGAAAGTTAGAGGTT | Eukaryota | Opisthokonta | Metazoa | Arthropoda | Crustacea | Maxillopoda | NA | NA | 100 | 100 | 100 | 100 | 100 | 100 | 39 | 39 |
ASV_002 | CAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAATTTCTGGTGGGAGTGTACAGTCCGACACTGAGTGTTGGTTCTTGTATGTCTCCGGCCATCCTTGGAGTGAATCTGTCTGGCATTCAGTTGTTGGGCAGAGTATCTCCATCGTTTACTGTGAGGAAATGAAAGTGTTCAAAGCAGGCTTATGCCGTTGAATCTTTTAGCATGGAATAATAAGATAGGATCTTGATACTATTTTGTTGGTTTGCGTGTTAAGATAATGATGAATAGGGACAGTTGGGGGTATTTGTATTCCATAGTCAGAGGTGAAATTCTTGGATTTATGGAAGACAAACTAATGCGAAAGCATCTACCAAGGATGTTTTCATTAATCAAGAACGAAAGTGTGGGGAT | Eukaryota | Stramenopiles | Ochrophyta | Bacillariophyta | Bacillariophyta_X | Polar-centric-Mediophyceae | Cerataulina | Cerataulina_pelagica | 100 | 100 | 100 | 100 | 100 | 100 | 72 | 72 |
ASV_003 | CAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATTTGTGGTGTGTCCAGTTGGCCTTTGCTCTTTGAGTGATTGCGCTGTACTGGTCTGCCATGTTTGGGTGGAATCTGTGTGGCATTAAGTTGTCGTGCAGGGGATGCCCATCGTTTACTGTGAAAAAATTAGAGTGTTCAAAGCAGGCTTATGCCGTTGAATATATTAGCATGGAATAATAATATAGGACCTTGGTACTATTTTGTTGGTTTGCGCACTAAGGTAATGATTAAGAGGGACAGTTGGGGGTATTTGTATTCCATTGTCAGAGGTGAAATTCTTGGATTTTTGGAAGACAAACTACTGCGAAAGCATTTACCAAGGATGTTTTCATTAATCAAGAACGAAAGTTAGGGGAT | Eukaryota | Stramenopiles | Ochrophyta | Bacillariophyta | Bacillariophyta_X | Raphid-pennate | Pseudo-nitzschia | NA | 100 | 100 | 100 | 100 | 100 | 100 | 99 | 45 |
ASV_004 | CAATAGCGTATATTGAAGTTGTTGCGGTTAAAAAGCTCGTCGTTGGATTTCTGTTGAGGTCGAACAGTGCCGCCCTATGGGTGCGCAACTGGCTCGGCTTCGGCATCTTCTTGGAGAGCGCATCTGCACTTAACTGTGTGGATCGTACTTCAAGTCAGTTACTTTGAGGAATTGTGAGTGTTTCAAGCAGGCTTACGCCGTTGAATCCGTTAGCATGGAATAATAGAATAGGACCTCGGTTCTATTTTGTTGGTTTTGAGAGCTGGGGCAGCGATTGATAGGGACGGATGGGGGCGTCCGAACTTTACTGTCAGAGGTGAAATTCTTAGATCGGTCAAAGGCGAACTACTGCGAACGCATTCGCCAAGAATGTTTTCTTTAATCAAGAACGAAAGTTAGGGGAT | Eukaryota | NA | NA | NA | NA | NA | NA | NA | 100 | 36 | 34 | 34 | 27 | 27 | 27 | 27 |
https://github.com/BigelowLab/edna-dada2/blob/main/3step/asv_18S_postprocess.R#L148
What are we doing here? And why doesn't it go into ASV_taxa.csv?