adw96 / DivNet

diversity estimation under ecological networks
83 stars 18 forks source link

Update beta_diversity_helper.R #62

Closed AstrobioMike closed 3 years ago

AstrobioMike commented 3 years ago

Hiya! Thanks for DivNet and Breakaway :)

I was hitting a semi-unhelpful error message when running simplifyBeta telling me my columns didn't exist, and after a bit of poking around realized it was because my sample names had dashes in them and the data.frame calls in the function were adjusting them (i.e. due to the default check.names=TRUE), e.g.:

Lee_sub <- tax_glom(Lee, "Phylum")

counts <- otu_table(Lee_sub)
sam <- sample_data(Lee_sub)

# no problems when like this
colnames(counts)
# "BW1"   "BW2"   "R10"   "R11"   "R11BF" "R12"   "R1A"   "R1B"   "R2"    "R3"    "R4"    "R5"    "R6"    "R7"    "R8"    "R9"

# but problem when any sample names will be changed by the data.frame function default check.names
colnames(counts) <- gsub("W", "-", colnames(counts))
colnames(counts)
# "B-1"   "B-2"   "R10"   "R11"   "R11BF" "R12"   "R1A"   "R1B"   "R2"    "R3"    "R4"    "R5"    "R6"    "R7"    "R8"    "R9"

row.names(sam) <- gsub("W", "-", colnames(counts))

phy <- phyloseq(counts, sam)

dv <- divnet(phy)

simplifyBeta(dv, phy, "bray-curtis", "char")

# Error: Can't subset columns that don't exist.
# x Columns `B-1` and `B-2` don't exist.

Adding in the check.names=FALSE works of course like in the adjustment suggested above, so maybe that's ok with you folks. If that's not ideal, then maybe putting in a check would be ok, e.g.:

simplifyBeta <- function (dv, physeq, measure, x) {

    in_sample_names <- physeq %>% sample_names
    changed_sample_names <- dv[[measure]] %>% data.frame %>% names

    if ( ! all(in_sample_names == changed_sample_names) ) {
        stop("Your sample names contain non-standard characters, please change that :)")
    }

    beta_var_matrix <- dv[[paste(measure, "-variance", sep = "")]]
    vars <- physeq %>% sample_data %>% get_variable(x)
    names(vars) <- physeq %>% sample_names
    dv[[measure]] %>% data.frame %>% rownames_to_column("Sample1") %>% 
        gather("Sample2", "beta_est", names(vars)) %>% add_column(beta_var = beta_var_matrix %>% 
        data.frame %>% rownames_to_column("Sample1") %>% gather("Sample2", 
        "var", names(vars)) %>% select("var") %>% c %>% unlist) %>% 
        mutate(Covar1 = vars[Sample1], Covar2 = vars[Sample2]) %>% 
        select(Covar1, Covar2, beta_est, beta_var) %>% dplyr::filter(beta_est > 
        1e-16) %>% unique %>% distinct(beta_est, .keep_all = TRUE) %>% 
        mutate(lower = pmax(0, beta_est - 2 * sqrt(beta_var)), 
            upper = pmax(0, beta_est + 2 * sqrt(beta_var)))
}

Which would instead return this:

simplifyBeta(dv, phy, "bray-curtis", "char")

# Error in mod_simplifyBeta(dv, phy, "bray-curtis", "char") : 
#   Your sample names contain non-standard characters, please change that :)

Not critical of course, but tripped me up for a bit. Maybe I shouldn't be turning off check.names, haha

codecov-io commented 3 years ago

Codecov Report

Merging #62 into master will not change coverage. The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #62   +/-   ##
=======================================
  Coverage   67.42%   67.42%           
=======================================
  Files          13       13           
  Lines         614      614           
=======================================
  Hits          414      414           
  Misses        200      200           
Impacted Files Coverage Δ
R/beta_diversity_helper.R 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 31e04e2...fb1ef38. Read the comment docs.

svteichman commented 3 years ago

This is addressed in pull request #85 (your alternate suggestion, the check that throws a warning if there are non-standard characters is implemented). Thanks Mike!