benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
147 stars 25 forks source link

Error using 'decontam' #75

Open ziabulous-repository opened 4 years ago

ziabulous-repository commented 4 years ago

I'm attempting to use the package decontam to filter out contaminants found in my negative controls. I followed the introductory tutorial, and created an OTU table (rows = samples, columns = OTUs) and imported it as dataframe object and then as a matrix. I also created a metadata file with two columns: 1) Samples (same order as OTU table) and 2) sample_or_control to describe negative control status (Values = true_sample or control). Upon running the code in R, I received an error. I suspect it has to do with how I'm using isContaminant.. Could anyone help me with this?

Here are the scripts I ran: _decontam_matrix <- data.matrix(decontam_data, rownames.force = NA) decontam_metadata <- sample_data(Metadata) sample_data(contam_metadata)$sample_or_control <- sample_data(contam_metadata)$sample_or_control == "control" contamdf.prev <- isContaminant(decontam_matrix, method="prevalence", neg="contam_metadata$sample_orcontrol")

Here is the error printed from R: Error in if (any(rowSums(seqtab) == 0)) { : missing value where TRUE/FALSE needed

As a side question: Does anyone have any experience running the frequency-based method with UV-vis based spectrophotometer readings? The tutorial uses PicoGreen assays. Would NanoVue concentrations work as well?

benjjneb commented 4 years ago

Can you report the output of the following?

class(seqtab)
seqtab[1:3, 1:3]

Does anyone have any experience running the frequency-based method with UV-vis based spectrophotometer readings? The tutorial uses PicoGreen assays. Would NanoVue concentrations work as well?

Sorry never used NanoVue myself so can't offer any good guidance. If it is reasonably linear with the real DNA concentration (perfection is not needed) it should work well enough.

ziabulous-repository commented 4 years ago

Can you report the output of the following?


class(seqtab)
seqtab[1:3, 1:3]

Here's the output:

class(seqtab) Error: object 'seqtab' not found seqtab[1:3, 1:3] Error: object 'seqtab' not found

benjjneb commented 4 years ago

Sorry, should have read:

class(decontam_matrix)
decontam_matrix[1:3, 1:3]
ziabulous-repository commented 4 years ago

No worries, here is the output.

class(decontam_matrix) [1] "matrix" decontam_matrix[1:3, 1:3] id fishgut.001 fishgut.002 [1,] NA 131 104 [2,] NA 0 117 [3,] NA 0 82

Sorry, should have read:

class(decontam_matrix)
decontam_matrix[1:3, 1:3]
benjjneb commented 4 years ago

So on each row you are seeing NA as the first entry. This is the problem, and indicates something has gone wrong in your conversion of your OTU table to a matrix.

You said you "converted it" to a data.frame first, and then to a matrix. How? What was the original format of this OTU table?

ziabulous-repository commented 4 years ago

So on each row you are seeing NA as the first entry. This is the problem, and indicates something has gone wrong in your conversion of your OTU table to a matrix.

You said you "converted it" to a data.frame first, and then to a matrix. How? What was the original format of this OTU table?

Sorry, bad wording. I originally imported it as a dataframe, then converted it into a matrix using data.matrix(decontam_data, rownames.force = NA)

benjjneb commented 4 years ago

Sorry, bad wording. I originally imported it as a dataframe, then converted it into a matrix using data.matrix(decontam_data, rownames.force = NA)

Right, this process is going wrong at some point. What was the exact code you used? And can you inspect the object at each point (after importation as a data.frame, and after being a matrix) in the same way as above?

ziabulous-repository commented 4 years ago

Sorry, bad wording. I originally imported it as a dataframe, then converted it into a matrix using data.matrix(decontam_data, rownames.force = NA)

Right, this process is going wrong at some point. What was the exact code you used? And can you inspect the object at each point (after importation as a data.frame, and after being a matrix) in the same way as above?

Thank you for your patience with this. I restarted the process from the beginning. After importing both the OTU table and metadata (as excel sheets) using the menu option in R, here is the full code I ran:

class(decontam_data) [1] "tbl_df" "tbl" "data.frame" decontam_data[1:3, 1:3] A tibble: 3 x 3 id 94ac298441f79f785f62dac763225f4d 138e6cd44e675df4c3a964767731851f

1 fishgut.001 131 0 2 fishgut.002 104 117 3 fishgut.003 4826 4416 decontam_matrix <- data.matrix(decontam_data, rownames.force = NA) Warning message: In data.matrix(decontam_data, rownames.force = NA) : NAs introduced by coercion class(decontam_matrix) [1] "matrix" decontam_matrix[1:3, 1:3] id 94ac298441f79f785f62dac763225f4d 138e6cd44e675df4c3a964767731851f [1,] NA 131 0 [2,] NA 104 117 [3,] NA 4826 4416 contam_metadata <- sample_data(Metadata) class(contam_metadata) [1] "sample_data" attr(,"package") [1] "phyloseq" contam_metadata[1:3, 1:3] Error in `[.data.frame`(data.frame(x), i, j, drop = FALSE) : undefined columns selected sample_data(contam_metadata)$sample_or_control <- sample_data(contam_metadata)$sample_or_control == "control" Error in phyloseq(x@otu_table, value, x@tax_table, x@phy_tree, x@refseq) : no slot of name "otu_table" for this object of class "sample_data" contamdf.prev <- isContaminant(decontam_matrix, method="prevalence", neg="contam_metadata$sample_or_control") Error in if (any(rowSums(seqtab) == 0)) { : missing value where TRUE/FALSE needed
benjjneb commented 4 years ago

What is the code you ran to import decontam_data initially? I still don't see that.

ziabulous-repository commented 4 years ago

What is the code you ran to import decontam_data initially? I still don't see that.

I used the 'Import Dataset > From Excel...' command the menu options in R Studio.

library(readxl) decontam_data <- read_excel("Desktop/decontam_data.xlsx",

  • sheet = "decontam_data")
benjjneb commented 4 years ago

So that's probably the problem, that command is likely not importing the data correctly.

That said, I don't know how that command works, and don't know how exactly you saved your Excel file. In general, saving in complex formats like excel is bad news then it comes to later using that data in data science platforms like R.

I would suggest trying to save the data in .csv format from Excel, and then reading it in via the R read.csv function, with some attention paid to the options for that funciton.

ziabulous-repository commented 4 years ago

So that's probably the problem, that command is likely not importing the data correctly.

That said, I don't know how that command works, and don't know how exactly you saved your Excel file. In general, saving in complex formats like excel is bad news then it comes to later using that data in data science platforms like R.

I would suggest trying to save the data in .csv format from Excel, and then reading it in via the R read.csv function, with some attention paid to the options for that funciton.

Thanks @benjjneb. It seems that the data frame is now imported correctly, but an issue arises with the later scripts:

decontam_data <- read.csv("Desktop/decontam_data.csv")

class(decontam_data)
decontam_data[1:3, 1:3]

decontam_matrix <- data.matrix(decontam_data, rownames.force = NA)

class(decontam_matrix)
decontam_matrix[1:3, 1:3]

contam_metadata <- sample_data(Metadata)

class(contam_metadata)
contam_metadata[1:3, 1:3]

sample_data(contam_metadata)$sample_or_control <- sample_data(contam_metadata)$sample_or_control == "control"
contamdf.prev <- isContaminant(decontam_matrix, method="prevalence", neg="contam_metadata$sample_or_control")
benjjneb commented 4 years ago

In this new post I can't see the output of anything, can you also include the output, not just the commands?

ziabulous-repository commented 4 years ago

In this new post I can't see the output of anything, can you also include the output, not just the commands?

Sorry about that. Here's the output:

`> decontam_data <- read.csv("Desktop/decontam_data.csv")

class(decontam_data) [1] "data.frame" decontam_data[1:3, 1:3] id X94ac298441f79f785f62dac763225f4d X138e6cd44e675df4c3a964767731851f 1 fishgut.001 131 0 2 fishgut.002 104 117 3 fishgut.003 4826 4416 decontam_matrix <- data.matrix(decontam_data, rownames.force = NA) class(decontam_matrix) [1] "matrix" decontam_matrix[1:3, 1:3] id X94ac298441f79f785f62dac763225f4d X138e6cd44e675df4c3a964767731851f [1,] 1 131 0 [2,] 2 104 117 [3,] 3 4826 4416 contam_metadata <- sample_data(Metadata) class(contam_metadata) [1] "sample_data" attr(,"package") [1] "phyloseq" contam_metadata[1:3, 1:3] Error in [.data.frame(data.frame(x), i, j, drop = FALSE) : undefined columns selected sample_data(contam_metadata)$sample_or_control <- sample_data(contam_metadata)$sample_or_control == "control" Error in phyloseq(x@otu_table, value, x@tax_table, x@phy_tree, x@refseq) : no slot of name "otu_table" for this object of class "sample_data" contamdf.prev <- isContaminant(decontam_matrix, method="prevalence", neg="contam_metadata$sample_or_control") Error in sum(neg, na.rm = TRUE) : invalid 'type' (character) of argument`

benjjneb commented 4 years ago

Again, the error is at least starting immediately. When you look at the output of decontam_data[1:3, 1:3] waht do you see?

I see that the first column is not the expected abundances, but it is the row names. The import function needs to be modified to tell it that the rownames are in the first column.

ziabulous-repository commented 4 years ago

Again, the error is at least starting immediately. When you look at the output of decontam_data[1:3, 1:3] waht do you see?

I see that the first column is not the expected abundances, but it is the row names. The import function needs to be modified to tell it that the rownames are in the first column.

So I think the early errors due to data importing are all good. The only one that remain have to do with the 'isContaminant' function:

decontam_matrix[1:3, 1:3] X94ac298441f79f785f62dac763225f4d X138e6cd44e675df4c3a964767731851f fishgut.001 131 0 fishgut.002 104 117 fishgut.003 4826 4416 X535aeaa058e5fe4e898ff03a15b3533e fishgut.001 0 fishgut.002 82 fishgut.003 5383 Metadata[1:2, 1:2] id sample_or_control 1 fishgut.001 FALSE 2 fishgut.002 FALSE Metadata$sample_or_control <- Metadata$sample_or_control == "control" contamdf.prev <- isContaminant(decontam_matrix, method="prevalence", neg="control") Error in sum(neg, na.rm = TRUE) : invalid 'type' (character) of argument

benjjneb commented 4 years ago

The fucntion is expecting the neg argument to be a vector of TRUE/FALSE arguments with TRUEs for the negative control samples. That would be Metadata$sample_or_control, not the string "control".