SCHEMATIC is an acronym for Schema Engine for Manifest Ingress and Curation. The Python based infrastructure provides a novel schema-based, metadata ingress ecosystem, that is meant to streamline the process of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors.
synapse.org
Note: Our credential policy for Google credentials in order to create Google sheet files from Schematic, see tutorial 'HERE'. If you plan to use config.yml
, please ensure that the path of schematic_service_account_creds.json
is indicated there (see google_sheets > service_account_creds
section)
Create and activate a virtual environment within which you can install the package:
python3 -m venv .venv
source .venv/bin/activate
Note: Python 3 has a built-in support for virtual environment venv so you no longer need to install virtualenv.
Install and update the package using pip:
python3 -m pip install schematicpy
If you run into error: Failed building wheel for numpy, the error might be able to resolve by upgrading pip. Please try to upgrade pip by:
pip3 install --upgrade pip
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.
Please note we have a code of conduct, please follow it in all your interactions with the project.
Clone the schematic
package repository.
git clone https://github.com/Sage-Bionetworks/schematic.git
Install poetry
(version 1.3.0 or later) using either the official installer or pipx. If you have an older installation of Poetry, we recommend uninstalling it first.
Start the virtual environment by doing:
poetry shell
Install the dependencies by doing:
poetry install
This command will install the dependencies based on what we specify in poetry.lock. If this step is taking a long time, try to go back to step 2 and check your version of poetry. Alternatively, you could also try deleting the lock file and regenerate it by doing poetry install
(Please note this method should be used as a last resort because this would force other developers to change their development environment)
If you want to install the API you will need to install those dependencies as well:
poetry install --extras "api"
If you want to install the uwsgi:
poetry install --extras "api"
There are two main configuration files that need to be edited: config.yml and synapseConfig
Configure .synapseConfig File
Download a copy of the .synapseConfig
file, open the file in the
editor of your choice and edit the username
and authtoken
attribute under the authentication
section
Note: You could also visit configparser doc to see the format that .synapseConfig
must have. For instance:
[authentication]
username = ABC
authtoken = abc
Configure config.yml File
There are some defaults in schematic that can be configured. These fields are in config_example.yml
:
# This is an example config for Schematic.
# All listed values are those that are the default if a config is not used.
# Save this as config.yml, this will be gitignored.
# Remove any fields in the config you don't want to change
# Change the values of any fields you do want to change
# This describes where assets such as manifests are stored
asset_store:
# This is when assets are stored in a synapse project
synapse:
# Synapse ID of the file view listing all project data assets.
master_fileview_id: "syn23643253"
# Path to the synapse config file, either absolute or relative to this file
config: ".synapseConfig"
# Base name that manifest files will be saved as
manifest_basename: "synapse_storage_manifest"
# This describes information about manifests as it relates to generation and validation
manifest:
# Location where manifests will saved to
manifest_folder: "manifests"
# Title or title prefix given to generated manifest(s)
title: "example"
# Data types of manifests to be generated or data type (singular) to validate manifest against
data_type:
- "Biospecimen"
- "Patient"
# Describes the location of your schema
model:
# Location of your schema jsonld, it must be a path relative to this file or absolute
location: "tests/data/example.model.jsonld"
# This section is for using google sheets with Schematic
google_sheets:
# Path to the synapse config file, either absolute or relative to this file
service_acct_creds: "schematic_service_account_creds.json"
# When doing google sheet validation (regex match) with the validation rules.
# true is alerting the user and not allowing entry of bad values.
# false is warning but allowing the entry on to the sheet.
strict_validation: true
If you want to change any of these copy config_example.yml
to config.yml
, change any fields you want to, and remove any fields you don't.
For example if you wanted to change the folder where manifests are downloaded your config should look like:
manifest:
manifest_folder: "my_manifest_folder_path"
Note: config.yml
is ignored by git.
Note: Paths can be specified relative to the config.yml
file or as absolute paths.
Login to Synapse by using the command line On the CLI in your virtual environment, run the following command:
synapse login -u <synapse username> -p <synapse password> --rememberMe
Obtain Google credential Files
Running schematic init
is no longer supported due to security concerns. To obtain schematic_service_account_creds.json
, please follow the instructions here.
As v22.12.1 version of schematic, using
token
mode of authentication (in other words, usingtoken.pickle
andcredentials.json
) is no longer supported due to Google's decision to move away from using OAuth out-of-band (OOB) flow. Click here to learn more.
Notes: Use the schematic_service_account_creds.json
file for the service
account mode of authentication (for Google services/APIs). Service accounts
are special Google accounts that can be used by applications to access Google APIs
programmatically via OAuth2.0, with the advantage being that they do not require
human authorization.
Background: schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future.
This repository is configured to utilize pre-commit hooks as part of the development process. To enable these hooks, please run the following command and look for the following success message:
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
For new features, bugs, enhancements
Note: Make sure you have the latest version of the develop
branch on your local machine.
docker pull sagebionetworks/schematic:latest
from the CLI or, run docker compose up
after cloning the schematic github repo sagebionetworks/schematic:latest
is the name of the image chosendocker run <flags> <schematic command and args>
. config.yml
to run API endpoints:docker run --rm -p 3001:3001 \
-v $(pwd):/schematic -w /schematic --name schematic \
-e SCHEMATIC_CONFIG=/schematic/config.yml \
-e GE_HOME=/usr/src/app/great_expectations/ \
sagebionetworks/schematic \
python /usr/src/app/run_api.py
config.yml
and schematic_service_account_creds.json
as an environment variable to run API endpoints:save content of config.yml
as to environment variable SCHEMATIC_CONFIG_CONTENT
by doing: export SCHEMATIC_CONFIG_CONTENT=$(cat /path/to/config.yml)
Similarly, save the content of schematic_service_account_creds.json
as SERVICE_ACCOUNT_CREDS
by doing: export SERVICE_ACCOUNT_CREDS=$(cat /path/to/schematic_service_account_creds.json)
Pass SCHEMATIC_CONFIG_CONTENT
and schematic_service_account_creds
as environment variables by using docker run
docker run --rm -p 3001:3001 \
-v $(pwd):/schematic -w /schematic --name schematic \
-e GE_HOME=/usr/src/app/great_expectations/ \
-e SCHEMATIC_CONFIG_CONTENT=$SCHEMATIC_CONFIG_CONTENT \
-e SERVICE_ACCOUNT_CREDS=$SERVICE_ACCOUNT_CREDS \
sagebionetworks/schematic \
python /usr/src/app/run_api.py
To run example below, first clone schematic into your home directory git clone https://github.com/sage-bionetworks/schematic ~/schematic
Then update .synapseConfig with your credentials
docker run \
-v ~/schematic:/schematic \
-w /schematic \
-e SCHEMATIC_CONFIG=/schematic/config.yml \
-e GE_HOME=/usr/src/app/great_expectations/ \
sagebionetworks/schematic schematic model \
-c /schematic/config.yml validate \
-mp /schematic/tests/data/mock_manifests/Valid_Test_Manifest.csv \
-dt MockComponent \
-js /schematic/tests/data/example.model.jsonld
docker run -v %cd%:/schematic \
-w /schematic \
-e GE_HOME=/usr/src/app/great_expectations/ \
sagebionetworks/schematic \
schematic model \
-c config.yml validate -mp tests/data/mock_manifests/inValid_Test_Manifest.csv -dt MockComponent -js /schematic/data/example.model.jsonld
cd docs
make html
command to re-generate the build
folder.Other helpful resources:
If you install external libraries by using poetry add <name of library>
, please make sure that you include pyproject.toml
and poetry.lock
file in your commit.
You can create bug and feature requests through Sage Bionetwork's FAIR Data service desk. Providing enough details to the developers to verify and troubleshoot your issue is paramount:
Please visit more documentation here
All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the tests subdirectory.
You can run the test suite in the following way:
pytest -vs tests/
develop
before your update.
develop
branch into their feature branches for their tests to work.Main contributors and developers: