Unified submission - Githubissues

IHEC / ihec-ecosystems

This repo is for code and documentation associated with the ihec-ecosystems working group

Apache License 2.0

5 stars 6 forks source link

Unified submission #50

Open dbujold opened 5 years ago

dbujold commented 5 years ago

Currently, the data hub validator relies on an exact match between an EpiRR record experiment name and IHEC Data Hub name. This is problematic because 1- An EpiRR record can have more than one experiment of the same type 2- The name used to describre the experiment can be different between both sources 3- Experiment Type property will soon be an optional property, that can be replaced by an ontology URI.

dzerbino commented 4 years ago

Result of Banff discussion: this issue would be resolved by unifying submissions into EpiRR and the IHEC Portal. Renaming issues.

sitag commented 4 years ago

To document: the proposal is to cross reference all repeated metadata fields in datahub schema from the epirr registry.

dzerbino commented 4 years ago

Current pipeline: 201710_IHEC_Ecosystem_MK

Spec for EpiRR JSON: https://github.com/Ensembl/EpiRR/blob/master/README.md Spec for Portal JSON: https://github.com/IHEC/ihec-ecosystems/tree/master/IHEC_Data_Hub

dzerbino commented 4 years ago

Desired result: single point of contact. A JSON is submitted to EpiRR that sends a template Portal JSON which is then filled by the team.

TODO:

What info can be dropped from the Portal JSON (assuming the portal can read it from EpiRR)?
Create Portal JSON generator from EpiRR submission
Ensure validators still function properly
Set up SOP
Create joint RT system
Update documentation
Test

dzerbino commented 4 years ago

What info can be dropped from the Portal JSON (assuming the portal can read it from EpiRR)?

{
   "datasets": {
        "experiment_1": {
            "experiment_attributes" // convert to ID
        },
        "experiment_2": {
            ...
        },
    }
    "samples": { ... }
}

What info needs to be retained:

{
    "hub_description": { ... },
    "datasets": {
        "experiment_1": {
            "sample_id": "...",
            "experiment_id":  "..." ,
            "analysis_attributes": { ... },
            "browser": { ... }
        },
        "experiment_2": {
           ...
        },
    }
}

sitag commented 4 years ago

@dzerbino They use the same schemas, so everything can be dropped as long as we keep the identifiers.

dbujold commented 4 years ago

Basically, what's needed is a way to link sample and experiment metadata, that would be obtained from EpiRR, to processed data (bigwigs, bigbeds) and data processing metadata (analysis_attributes), that would be provided to the IHEC Data Portal.