biocore / emp

Code repository of the Earth Microbiome Project.
http://www.earthmicrobiome.org
BSD 3-Clause "New" or "Revised" License
154 stars 68 forks source link

Opening emp_cr_silva_16S_123.qc_filtered.biom in R #131

Closed padpadpadpad closed 1 year ago

padpadpadpad commented 1 year ago

Hi

I have downloaded the taxonomy table emp_cr_silva_16S_123.qc_filtered.biom from the site ftp://ftp.microbio.me/emp/release1. However, when I try and read this into R I get the error:

d_16s <- phyloseq::import_biom('~/Downloads/emp_cr_silva_16S_123.qc_filtered.biom')
Error in read_biom(biom_file = BIOMfilename) : 
  Both attempts to read input file:
~/Downloads/emp_cr_silva_16S_123.qc_filtered.biom
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.

The filename is correct so not sure where to go. I am trying to calculate species/ASV diversity of a few different genera, and would like to use the un-rarefied QC controlled samples.

antgonza commented 1 year ago

Could you confirm that your file is OK? I just tested downloading the file and summarizing it via biom and it worked fine:

  1. Download
    
    $ wget http://ftp.microbio.me/emp/release1/otu_tables/closed_ref_silva/emp_cr_silva_16S_123.qc_filtered.biom
    --2023-05-30 10:43:21--  http://ftp.microbio.me/emp/release1/otu_tables/closed_ref_silva/emp_cr_silva_16S_123.qc_filtered.biom
    Resolving ftp.microbio.me (ftp.microbio.me)... 169.228.46.98
    Connecting to ftp.microbio.me (ftp.microbio.me)|169.228.46.98|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 306752783 (293M)
    Saving to: ‘emp_cr_silva_16S_123.qc_filtered.biom’

emp_cr_silva_16S_123.qc_filtered.biom 100%[============================================================================================================================================================================================>] 292.54M 24.3MB/s in 12s

2023-05-30 10:43:33 (24.0 MB/s) - ‘emp_cr_silva_16S_123.qc_filtered.biom’ saved [306752783/306752783]


2. Check checksum:

$ md5 emp_cr_silva_16S_123.qc_filtered.biom MD5 (emp_cr_silva_16S_123.qc_filtered.biom) = 4ea617d8e598d1126ed529c7166d55b3


3. Summarize table:

$ biom summarize-table -i emp_cr_silva_16S_123.qc_filtered.biom | head Num samples: 23,323 Num observations: 126,730 Total count: 1,737,277,216 Table density (fraction of non-zero values): 0.016

Counts/sample summary: Min: 10,004.000 Max: 2,407,345.000 Median: 56,622.000 Mean: 74,487.725

padpadpadpad commented 1 year ago

Yep this all works. So must be something with how .biom files are being read in by R?

i have tried both phyloseq and rbiom and neither work...

justinshaffer commented 1 year ago

I suggest trying qiime2R:

Doc: https://rdrr.io/github/jbisanz/qiime2R/

How to install: remotes::install_github("jbisanz/qiime2R") library(qiime2R)

On Tue, May 30, 2023 at 10:30 AM Daniel Padfield @.***> wrote:

Yep this all works. So must be something with how .biom files are being read in by R?

i have tried both phyloseq and rbiom and neither work...

— Reply to this email directly, view it on GitHub https://github.com/biocore/emp/issues/131#issuecomment-1568811340, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSDCGCLXHDUZ2HZ7L4DOFDXIYVC5ANCNFSM6AAAAAAYUH56TI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Justin Shaffer, PhD Postdoctoral Researcher Rob Knight Group Department of Pediatrics, School of Medicine University of California, San Diego justinshafferbio.wordpress.com

antgonza commented 1 year ago

... another option is to open an issue directly with those packages; closing for now.

padpadpadpad commented 1 year ago

Will raise an Issue with them. It is unclear how qiime2R will help as the file extension is .biom not .qza.

It is also somewhat strange as a smaller file such as emp_cr_silva_16S_123.subset_2k.rare_10000.biom reads in successfully.