Common data structures and APIs.
This repo contains
The schemas are packaged into a python module bioschemas
The justification for the packaging is threefold:
pip install git+https://github.com/ohsu-computational-biology/bioschemas
cd bin
./package-all.sh
... generates schema snapshot ...
... runs setup tests ...
----------------------------------------------------------------------
Ran 4 tests in 0.100s
OK
$ bioschemas-snapshot --help
usage: bioschemas-snapshot [-h] [-o OUTPUT] [-v]
Extract bioschemas schema directory [ga4gh,bmeg,gdc]
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Extract to this directory name. Must not already
exist; it will be created as well as missing parent
directories.
-v, --version Print git hashes
The snapshot can be used by any language context and has the following structure:
.
├── cerberus
│ ├── bmeg
│ ├── ga4gh
│ │ ├── ga4gh
│ │ └── google
│ │ ├── api
│ │ └── protobuf
│ └── gdc
├── jsonschema
│ ├── bmeg
│ ├── ga4gh
│ │ ├── ga4gh
│ │ └── google
│ │ ├── api
│ │ └── protobuf
│ └── gdc
└── proto
├── bmeg
└── ga4gh
├── ga4gh
└── google
└── api
import bioschemas
bioschemas.schema_path()
>> '/home/someuser/bioschemas/bioschemas/snapshot'
bioschemas.json_schema('Resource')
>> {u'properties': {u'checksum': {u'type': u'string'}, u'class': {u'type': u'string'}, u'created': {u'type': u'string'}, u'datasetID': {u'type': u'string'}, u'description': {u'type': u'string'}, u'format': {u'type': u'string'}, u'gid': {u'type': u'string'}, u'id': {u'type': u'string'}, u'info': {u'type': u'object'}, u'location': {u'type': u'string'}, u'mimeType': {u'type': u'string'}, u'name': {u'type': u'string'}, u'size': {u'type': u'integer'}, u'type': {u'type': u'string'}}, u'type': u'object'}
bioschemas.cerberus_schema('Resource')
>> {u'checksum': {u'type': u'string'}, u'class': {u'type': u'string'}, u'created': {u'type': u'string'}, u'datasetID': {u'type': u'string'}, u'description': {u'type': u'string'}, u'format': {u'type': u'string'}, u'gid': {u'type': u'string'}, u'id': {u'type': u'string'}, u'info': {u'type': {u'type': u'dict'}}, u'location': {u'type': u'string'}, u'mimeType': {u'type': u'string'}, u'name': {u'type': u'string'}, u'size': {u'type': u'integer'}, u'type': {u'type': u'string'}}
bioschemas.git_hashes()
>> {u'bioschemas': u'f40f653', u'bmeg': u'537f94a', u'created_at': u'2016-11-18T17:47:56.858397Z', u'gdc': u'288f042'}
bioschemas.gdc_submission_template('file')
>> {u'aliquots': {u'submitter_id': None}, u'analytes': {u'submitter_id': None}, u'archives': {u'submitter_id': None}, u'cases': {u'submitter_id': None}, u'centers': {u'code': None}, u'data_formats': {u'name': None}, u'data_subtypes': {u'name': None}, u'derived_files': {u'submitter_id': None}, u'described_cases': {u'submitter_id': None}, u'experimental_strategies': {u'name': None}, u'file_name': None, u'file_size': None, u'md5sum': None, u'platforms': {u'name': None}, u'portions': {u'submitter_id': None}, u'project_id': None, u'related_files': {u'submitter_id': None}, u'samples': {u'submitter_id': None}, u'slides': {u'submitter_id': None}, u'state_comment': None, u'submitter_id': None, u'tags': {u'name': None}, u'type': u'file'}
The ga4gh and bmeg cannonical schemas are maintained in protobuf. The bin/custom-plugin.py
processes the schemas for alternate uses (jsonschema, cerebus). The bioschemas/snapshot
directory contains output from protoc.
Please do not hand edit, rather change custom-plugin.py
or json-to-cerberus.py