fair-research / deriva-action-provider

Globus Automate action provider for DERIVA
Apache License 2.0
0 stars 3 forks source link

reference code to load latest C2M2 datapackages #3

Closed rpwagner closed 5 years ago

rpwagner commented 5 years ago

@jgaff look at setup_c2m2_catalog.py it goes through all of Karl's steps to create the schema and load the data files. It imports cfde_datapackage.py, which does the work. The cfde_datapackage.py imports tableschema2erm.py which converts between tableschema and erm. tableschema_to_deriva.py is a test script for that.

From here you should be able to use Mike's example of bdbag --materialize https://examples.fair-research.org/public/CFDE/metadata/CFDE-all.v4.C2M2.bdbag.tgz and bdbag --materialize https://examples.fair-research.org/public/CFDE/metadata/CFDE-GTEx-v7.v0.C2M2.bdbag.tgz or whatever you have in place to pull the BDBags and then run the steps in setup_c2m2_catalog.py on the JSON files.

jgaff commented 5 years ago

Is this repo going to be the canonical place for these tools, so Karl will update them here? That's my biggest concern; the scripts have been updated a couple of times and I've had to port the changes over. I do have (probably less nice) code that does these steps already, based on the same scripts, but it's out of sync with Karl's latest changes.

rpwagner commented 5 years ago

For this week, I'll port changes over. If you can keep the wrapper scripts similar and import what's needed that would help. Mike and I both want to consolidate code examples so we can start on that next week.

rpwagner commented 5 years ago

For reference, here's what loading the latest GTEx serialization looks like, after logging in on the CLI:

 ./deriva-action-provider/examples/setup_c2m2_catalog.py CFDE-GTEx-v7.v1.C2M2.bdbag/data/GTEx_C2M2_instance.json 
New catalog has catalog_id=133
Don't forget to delete it if you are done with it!
Model deployed for CFDE-GTEx-v7.v1.C2M2.bdbag/data/GTEx_C2M2_instance.json.
Table Dataset data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Dataset.tsv.
Table Organization data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Organization.tsv.
Table SubjectGranularity data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectGranularity.tsv.
Table Platform data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Platform.tsv.
Table Method data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Method.tsv.
Table SampleType data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SampleType.tsv.
Table InformationType data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/InformationType.tsv.
Table FileFormat data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/FileFormat.tsv.
Table Anatomy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Anatomy.tsv.
Table Protocol data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Protocol.tsv.
Table SubjectGroup data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectGroup.tsv.
Table NCBI_Taxonomy_DB data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/NCBI_Taxonomy_DB.tsv.
Table AuxiliaryData data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/AuxiliaryData.tsv.
Table File data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/File.tsv.
Table Subject data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Subject.tsv.
Table DataEvent data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/DataEvent.tsv.
Table FilesInDatasets data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/FilesInDatasets.tsv.
Table DatasetsInDatasets data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/DatasetsInDatasets.tsv.
Table ProducedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/ProducedBy.tsv.
Table AnalyzedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/AnalyzedBy.tsv.
Table SubjectsInSubjectGroups data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectsInSubjectGroups.tsv.
Table ObservedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/ObservedBy.tsv.
Table SubjectTaxonomy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectTaxonomy.tsv.
Table GeneratedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/GeneratedBy.tsv.
Table SponsoredBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SponsoredBy.tsv.
Table BioSample data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/BioSample.tsv.
Table AssayedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/AssayedBy.tsv.
All data packages loaded.
Try visiting 'https://demo.derivacloud.org/chaise/recordset/#133/CFDE:Dataset'