clemente-lab / mmeds-meta

A database for storing and analyzing omics data
https://mmeds.org
2 stars 1 forks source link

Metadata subclasses and conversion #451

Open adamcantor22 opened 1 year ago

adamcantor22 commented 1 year ago

Is your feature request related to a problem? Please describe. The new Metadata class #432 will need to be able to represent metadata in a variety of different formats. We need to be able to import any of these formats and switch between them at will.

Describe the solution you'd like We can use subclasses to describe the formats:

In each level, there should be a function going 'up' (e.g. Qiime2 -> Metadata generic) and going down (e.g. Metadata generic -> MMEDS Full).

Converting to MMEDS Converting to MMEDS from another format presents by far the biggest challenge. In MMEDS format, we have 5 header rows: Table Name, Var Name, Opt/Req, Format, Unit/Length Restriction. If we're trying to get, say, the MMEDS Var 'Weight' from a Qiime2 file, we need to be prepared for multiple situations for example:

Then, once it is determined that the variable in question is indeed Weight, we also need to infer units. What if the data doesn't include any units at all and is purely numerical?

@circlespie and I discussed a solution that would use a two-tiered approach: a first pass using some kind of AI assistance, such as a word associative cluster that would be able to infer that a related word such as 'mass' implies the variable 'Weight'; then a fallback to check uncertainty with a user, asking 'Does 'Subanalysis' match 'SpecimenType'? y/n'.

Alternatively, a user could provide as supplementary input a dictionary explicitly defining what each label mapped to. However, this would require user preprocessing, the very thing we're attempting to avoid. Further discussion on this issue is warranted.

adamcantor22 commented 1 year ago

Note: conversion to Lefse needs to replace spaces in values with _ or nospace

adamcantor22 commented 1 year ago

Additional conversion type: REDcap format