VEuPathDB / EdaLoadingIssues

0 stars 0 forks source link

Mbio downloads: should assays have required variables? #42

Open asizemore opened 2 years ago

asizemore commented 2 years ago

Came up while discussing this ticket: https://github.com/VEuPathDB/web-eda/issues/1340

Question for @dpbisme

dmfalke commented 2 years ago

Unsure what repo this should be in. Feel free to change if this is not a loading issue.

danicahelb commented 1 year ago

@asizemore the correct required "key identifier" columns are showing up in the download files (though I just checked MALED 2y and not any of the other studies).

Required columns in the download files are the key identifiers for each row in the data table & identifiers to map each row to the "upstream" download file.

Required columns for repeated measures also include some measure of time (ie, age, date, study timepoint, etc). Currently mbio does NOT properly account for longitudinal sampling (see https://github.com/VEuPathDB/EdaLoadingIssues/issues/49 and https://github.com/VEuPathDB/EdaLoadingIssues/issues/48, and this will need to be double checked for all other studies on the mbio site).

Once mbio is able to handle repeated measures properly, we will need to require "time" columns in the download files.

We may want to include this for the sample files when the studies contain longitudinal data (for both mbio and ClinEpi).

Also, the UI needs to be updated to include the "required columns" box on the download modal for microbiomeDB.

Here is how it looks for ClinEpi:

image

mbio sample modal looks fine:

image

but mbio assay modal is missing the required columns box:

image
asizemore commented 1 year ago

Thanks for the the detail @danicahelb !

My guess is that the Required Columns box for Assays isn't showing up because no variables in that entity are marked as required. Where do we mark variables as required?

Once mbio is able to handle repeated measures properly, we will need to require "time" columns in the download files.

Sounds good. To clarify, do we need to require "time" in all the download files or just the repeated measures ones?

danicahelb commented 1 year ago

nope, time variable is only required for repeated measures files

danicahelb commented 1 year ago

Variables with mergeKey annotated should be included for repeated measure download file

mergeKey is the identifier for some measure of time (ie, age, date, study timepoint, etc).

I believe fixing the annotation properties (see https://github.com/VEuPathDB/EdaLoadingIssues/issues/35) will fix this issue (automatically?)