VEuPathDB / EdaLoadingIssues

0 stars 0 forks source link

Cross-silo: studies in both mbio & clinepi should use the same format for key identifiers #47

Closed danicahelb closed 1 year ago

danicahelb commented 1 year ago

We previously added linkages on ClinEpi & mbio sites for MAL-ED samples with microbiome data(see https://github.com/VEuPathDB/EdaLoadingIssues/issues/44)

The idea here is that users can download data from both sites and merge them together on their own

Currently, the key identifiers (ie, PID, SID) between the 2 sites are different. This can cause confusion as users will not understand why the samples are attached to different participant IDs.

In the future, I expect this will cause issues when we implement cross-silo queries on the websites

For example, here is what ClinEpi data from MAL-ED diarrhea samples look like:

image

And here is what mbio data from MAL-ED diarrhea samples look like:

image

When a study appears in both ClinEpi and mbio we should ensure that the key identifiers are in the same format across both sites

danicahelb commented 1 year ago

also, in the 16S rRNA (V4) assay download files, the key identifier matches the sample id key identifier in the mbio sample file but does NOT contain a column for the clinepidb sample ID.

If a user wants to analyze the 16S mbio data in the context of clinepi participant-level data, the only way for them to map the 16S data to the clinepi participant data is to:

  1. download the 16S RNA file for data of interest
  2. download the clinepi participant file for data of interest
  3. download the mbio sample file just to be able to map the mbio sample key identifiers to the clinepi sample key identifier
  4. download the clinepi sample file to map the clinepi sample identifier to the clinepi participant identifier
  5. once mbio sample key identifier is mapped to clinepi sample key identifier and the clinepi sample identifier is mapped to the clinepi participant identifier, only then can the mbio sample key identifier be mapped to the clinepi participant identifier to merge the 16S rna file to the clinepi participant file
danicahelb commented 1 year ago

Discussed with @dpbisme on 12/15/2022.

To do:

danicahelb commented 1 year ago

Confirmed fixed for gems, maled healthy & maled diar

danicahelb commented 1 year ago

MORDOR does not use the same samples on clinepi and mbio