benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
140 stars 24 forks source link

Install issues #145

Open acrowell09 opened 2 months ago

acrowell09 commented 2 months ago

Hello, I am having issues integrating decontam into Qiime2 as well as installing in R. When integrating into Qiime2, I am getting an error after running the command "qiime dev refresh-cache." I have attached the error as a .txt file.

In R, I am getting the following error: Warning in install.packages : package ‘decontam’ is not available for this version of R

I have tried installing in R under 4 different versions: 4.0.0, 4.1.1, 4.2.2, 4.3.3(all versions others seem to be using while running decontam, per Qiime forum). decontam-error.txt

Any help is appreciated!

benjjneb commented 2 months ago

decontam is available through Bioconductor (not CRAN, the default repository for install.packages) and you should use the Bioconductor install method, as described here: https://www.bioconductor.org/packages/release/bioc/html/decontam.html

That should get the decontam package installed in R.

As for the Q2 errors, you'll have to look for help with that on the QIIME2 forum. Although we support the QIIME2 project, I don't use it and can't really help with errors on that end of things.

acrowell09 commented 2 months ago

Thank you for the help! I was able to get decontam running in R!

Is there a way for me to use the prevalence method to sort though field and extraction controls? I have 10 sites, each with replicates and a negative control. Multiple sites were extracted on the same day so I have extraction controls that span multiple sites.

benjjneb commented 2 months ago

In general, controls that go through as much of the measurement process as possible are preferable to those that go through only part of the measurement process. So depending on what exactly field and extraction controls means, it may be better to use one or the other.

Second, the prevalence method relies on there being multiple negative controls, so that the proportion of samples for which a taxa is present (the "prevalence") can be meaningfully compared between the set of negative controls and the set of real samples. So the prevalence method doesn't work with single negative controls.

acrowell09 commented 2 months ago

To provide more info, I am working with a algal associated microbiome temporal data set with sampling points over the course of a year. Each sampling point has a water sample as a negative control to eliminate any sequences not associated with the alga. This is the field control I described, and there are 10 total field controls.

These samples were extracted across multiple days, so each extraction batch has a negative control. This control is a blank filter (no water filtered through it) and was taken through the extraction process. There are 5 total extraction controls.

Would batch and batch.combine be best in this case?

benjjneb commented 2 months ago

You can't separate these into individual batches and use decontam, since each batch has just 1 (or 2?) controls associated.

I would perhaps run decontam twice, once with all the extraction controls, and once with all the field controls.

fermat01 commented 1 month ago

May be you can check this example https://forum.qiime2.org/t/tutorial-integrating-qiime2-and-r-for-data-visualization-and-analysis-using-qiime2r/4121