KasperSkytte / ampvis2

Tools for visualising microbial community amplicon data
https://kasperskytte.github.io/ampvis2/
GNU General Public License v3.0
67 stars 23 forks source link

How to use the metadata from a biom file? #129

Closed bernt-matthias closed 2 years ago

bernt-matthias commented 2 years ago

I was starting to explore the ampvis2 package using the test data from this repo (https://github.com/MadsAlbertsen/ampvis2/blob/main/tests/testdata/rich_sparse_otu_table.biom). If I try to load the data with d <- amp_load("rich_sparse_otu_table.biom")

I get

Warning messages:
1: Could not find a column named OTU/ASV in otutable, using rownames as OTU ID's 
2: No sample metadata provided, creating dummy metadata.

So the metadata from the biom file is not used?

d looks like this:

> d
ampvis2 object with 3 elements. 
Summary of OTU table:
     Samples         OTUs  Total#Reads    Min#Reads    Max#Reads Median#Reads 
           6            5           27            3            7            4 
   Avg#Reads 
         4.5 

Assigned taxonomy:
Kingdom  Phylum   Class   Order  Family   Genus Species 
5(100%) 5(100%) 5(100%) 5(100%) 5(100%) 5(100%)  1(20%) 

Metadata variables: 2 
 SampleID, DummyVariable
> names(d)
[1] "abund"    "tax"      "metadata"
> names(d$metadata)
[1] "SampleID"      "DummyVariable"
> d$metadata["SampleID"]
        SampleID
Sample1  Sample1
Sample2  Sample2
Sample3  Sample3
Sample4  Sample4
Sample5  Sample5
Sample6  Sample6
bernt-matthias commented 2 years ago

Little update:

For the test data in https://github.com/biocore/biom-format/tree/master/examples taxonomy is loaded for all but min_sparse_otu_table_hdf5.biom.

Metadata is loaded from none on the test files.

KasperSkytte commented 2 years ago

The idea was to supply a separate metadata sheet. But that needs to be either documented or made so it can handle both. I'd go for the latter at some point. ampvis2 is on hold for a while for me, need to finish up a PhD :P Keep posting, I'll get back to it at some point

KasperSkytte commented 2 years ago

If metadata is present in the biom file, it will now be loaded. But overridden by the metadata argument if provided. The min_sparse_otu_table_hdf5.biom file simply doesn't contain any taxonomy.

bernt-matthias commented 2 years ago

Really cool. One more question, Is metadata from biom files loaded as character data, i.e. do I need to fix numeric / date metadata?

Bonus question: do you plan to make a release of the recent changes?

KasperSkytte commented 2 years ago

It's naive. All columns will be character only as the BIOM file is just a text file. No interpretation or guess work is done, that would lead to problems. You have to adjust the ampvis2object$metadata data frame afterwards in the ampvis2 object, which is essentially just a list of data frames with a "quality stamp" in the form of being an ampvis2 class object created by amp_load, so the format is consistent.

And I will make a release now. I always do that manually, so just waited for github actions to do its thing first.