Closed rpwagner closed 5 years ago
Is this repo going to be the canonical place for these tools, so Karl will update them here? That's my biggest concern; the scripts have been updated a couple of times and I've had to port the changes over. I do have (probably less nice) code that does these steps already, based on the same scripts, but it's out of sync with Karl's latest changes.
For this week, I'll port changes over. If you can keep the wrapper scripts similar and import what's needed that would help. Mike and I both want to consolidate code examples so we can start on that next week.
For reference, here's what loading the latest GTEx serialization looks like, after logging in on the CLI:
./deriva-action-provider/examples/setup_c2m2_catalog.py CFDE-GTEx-v7.v1.C2M2.bdbag/data/GTEx_C2M2_instance.json
New catalog has catalog_id=133
Don't forget to delete it if you are done with it!
Model deployed for CFDE-GTEx-v7.v1.C2M2.bdbag/data/GTEx_C2M2_instance.json.
Table Dataset data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Dataset.tsv.
Table Organization data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Organization.tsv.
Table SubjectGranularity data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectGranularity.tsv.
Table Platform data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Platform.tsv.
Table Method data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Method.tsv.
Table SampleType data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SampleType.tsv.
Table InformationType data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/InformationType.tsv.
Table FileFormat data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/FileFormat.tsv.
Table Anatomy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Anatomy.tsv.
Table Protocol data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Protocol.tsv.
Table SubjectGroup data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectGroup.tsv.
Table NCBI_Taxonomy_DB data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/NCBI_Taxonomy_DB.tsv.
Table AuxiliaryData data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/AuxiliaryData.tsv.
Table File data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/File.tsv.
Table Subject data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/Subject.tsv.
Table DataEvent data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/DataEvent.tsv.
Table FilesInDatasets data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/FilesInDatasets.tsv.
Table DatasetsInDatasets data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/DatasetsInDatasets.tsv.
Table ProducedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/ProducedBy.tsv.
Table AnalyzedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/AnalyzedBy.tsv.
Table SubjectsInSubjectGroups data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectsInSubjectGroups.tsv.
Table ObservedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/ObservedBy.tsv.
Table SubjectTaxonomy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SubjectTaxonomy.tsv.
Table GeneratedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/GeneratedBy.tsv.
Table SponsoredBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/SponsoredBy.tsv.
Table BioSample data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/BioSample.tsv.
Table AssayedBy data loaded from CFDE-GTEx-v7.v1.C2M2.bdbag/data/AssayedBy.tsv.
All data packages loaded.
Try visiting 'https://demo.derivacloud.org/chaise/recordset/#133/CFDE:Dataset'
@jgaff look at
setup_c2m2_catalog.py
it goes through all of Karl's steps to create the schema and load the data files. It importscfde_datapackage.py
, which does the work. Thecfde_datapackage.py
importstableschema2erm.py
which converts between tableschema and erm.tableschema_to_deriva.py
is a test script for that.From here you should be able to use Mike's example of
bdbag --materialize https://examples.fair-research.org/public/CFDE/metadata/CFDE-all.v4.C2M2.bdbag.tgz
andbdbag --materialize https://examples.fair-research.org/public/CFDE/metadata/CFDE-GTEx-v7.v0.C2M2.bdbag.tgz
or whatever you have in place to pull the BDBags and then run the steps insetup_c2m2_catalog.py
on the JSON files.