Open dbujold opened 5 years ago
Result of Banff discussion: this issue would be resolved by unifying submissions into EpiRR and the IHEC Portal. Renaming issues.
To document: the proposal is to cross reference all repeated metadata fields in datahub schema from the epirr registry.
Current pipeline:
Spec for EpiRR JSON: https://github.com/Ensembl/EpiRR/blob/master/README.md Spec for Portal JSON: https://github.com/IHEC/ihec-ecosystems/tree/master/IHEC_Data_Hub
Desired result: single point of contact. A JSON is submitted to EpiRR that sends a template Portal JSON which is then filled by the team.
TODO:
What info can be dropped from the Portal JSON (assuming the portal can read it from EpiRR)?
{
"datasets": {
"experiment_1": {
"experiment_attributes" // convert to ID
},
"experiment_2": {
...
},
}
"samples": { ... }
}
What info needs to be retained:
{
"hub_description": { ... },
"datasets": {
"experiment_1": {
"sample_id": "...",
"experiment_id": "..." ,
"analysis_attributes": { ... },
"browser": { ... }
},
"experiment_2": {
...
},
}
}
@dzerbino They use the same schemas, so everything can be dropped as long as we keep the identifiers.
Basically, what's needed is a way to link sample and experiment metadata, that would be obtained from EpiRR, to processed data (bigwigs, bigbeds) and data processing metadata (analysis_attributes), that would be provided to the IHEC Data Portal.
Currently, the data hub validator relies on an exact match between an EpiRR record experiment name and IHEC Data Hub name. This is problematic because 1- An EpiRR record can have more than one experiment of the same type 2- The name used to describre the experiment can be different between both sources 3- Experiment Type property will soon be an optional property, that can be replaced by an ontology URI.