biocore / songbird

Vanilla regression methods for microbiome differential abundance analysis
BSD 3-Clause "New" or "Revised" License
54 stars 25 forks source link

confusing ID column names #129

Open redbluewater opened 4 years ago

redbluewater commented 4 years ago

I am putting this comment here, but it also impacts the examples for mmvec (and maybe other programs).

The 'red sea' example for songbird uses sampleid as an identifier for the sequence data (feature_metadata.txt) and also uses sampleid as an identifier for the samples (in redsea_metadata.txt). As I am still learning qiime, I am not sure of the best way around this. However, having only two choices (some variant of sampleid and featureid) does not seem like enough choices. For example:

How about this idea:

This is more precise than 'sampleid' or 'featureid', especially as a mass spectrometry group who uses 'features' to define peaks in mass spectrometry data (the opposite of the use of features in the examples here).

Thanks as ever for developing these tools. They are extremely useful and I am excited to use them to dig into my own data.

mortonjt commented 4 years ago

@KujawinskiLaboratory this is a very good point - one that stemmed from the traditional definitions of metadata.

There have been a couple of discussions about this in other contexts, in particular https://github.com/biocore/emperor/issues/726 https://github.com/qiime2/q2-emperor/issues/81

I'd think it'll take a fairly extensive refactor of qiime2 to make sure that these types propagate accordingly (i.e. what about all of the other omics datatypes, such as transcriptomics, proteomics). CC @ElDeveloper @ebolyen for further discussion