Data Explorer lets you explore a dataset. The code (in this repo and data-explorer-indexers repo) is dataset-agnostic. All dataset configuration happens in config files.
Examples:
Run local Data Explorer with the 1000 Genomes dataset:
~/.config/gcloud/application_default_credentials.json
doesn't exist,
create it by running gcloud auth application-default login
.docker-compose up --build
localhost:4400
Index your dataset into Elasticsearch.
Before you can run the servers in this repo to display a Data Explorer UI,
your dataset must be indexed into Elasticsearch. Use an indexer from
https://github.com/DataBiosphere/data-explorer-indexers.
Create dataset_config/<my dataset>
ui.json
is not in data-explorer-indexers
repo.)gcs.json
must be filled out.If you want to use the Save in Terra feature, do this one-time setup.
If ~/.config/gcloud/application_default_credentials.json
doesn't exist,
create it by running gcloud auth application-default login
.
DATASET_CONFIG_DIR=dataset_config/<my dataset> docker-compose up --build -t 0
-t 0
makes Kibana stop more quickly after Ctrl-C
ui_1 | Module not found: Can't resolve 'superagent' in '/ui/src/api/src'
,
add -V
: DATASET_CONFIG_DIR=dataset_config/<my dataset> docker-compose up --build -t 0 -V
. -V
is only needed for the
next invocation of docker-compose
, not all future invocations.ES_JAVA_OPTS="-Xms10g -Xmx10g" docker-compose up --build -t 0
Navigate to localhost:4400
The basic flow:
GCP deployment:
For local development, an nginx reverse proxy is used to get around CORS:
Here's one possible flow.
If your dataset includes sample files (VCF, BAM, etc), then Data Explorer will have:
A Samples Overview facet, which gives an overview of your sample files:
Sample file facets will display number of sample files instead of number of participants. For example, if your dataset has 100 participant and each participant has 5 files, and there is a facet for "Raw coverage", the number on the upper right of the facet can be 0-500, and represents how many sample files are in the current selection.
If your dataset has longitudinal data, then Data Explorer will show time-series visualizations:
We use swagger-codegen to
automatically implement the API, as defined in api/api.yaml
, for the API
server and the UI. Whenever the API is updated, follow these steps to
update the server implementations:
rm ui/src/api/src/model/*
rm api/data_explorer/models/*
java -jar ~/swagger-codegen-cli.jar generate -i api/api.yaml -l python-flask -o api -DsupportPython2=true,packageName=data_explorer
java -jar ~/swagger-codegen-cli.jar generate -i api/api.yaml -l javascript -o ui/src/api -DuseES6=true
yapf -ir . --exclude ui/node_modules --exclude api/.tox
swagger-codegen generate -i api/api.yaml -l python-flask -o api -DsupportPython2=true,packageName=data_explorer
swagger-codegen generate -i api/api.yaml -l javascript -o ui/src/api -DuseES6=true
yapf -ir . --exclude ui/node_modules
docker-compose
should be at least 1.21.0.
The data-explorer-indexer repo
refers to the network
created by docker-compose
in this repo. Prior to 1.21.0, the network name was
dataexplorer_default
. Starting with 1.21.0, the network name is
data-explorer_default
.
Install swagger-codegen-cli.jar
. This is only needed if you modify
api.yaml
# Linux
wget https://repo1.maven.org/maven2/io/swagger/swagger-codegen-cli/2.3.1/swagger-codegen-cli-2.3.1.jar -O ~/swagger-codegen-cli.jar
# macOS
brew install swagger-codegen
In ui/
run npm install
. This will install tools used during git precommit,
such as formatting tools.
The Save in Terra feature temporarily stores data in a GCS bucket.
deploy.json
will still need to be filled out. A temporary file will be written to a
GCS bucket in the project in deploy.json
, even for local deployment of Data
Explorer. Choose a project where you have at least Project Editor permissions.deploy/create-export-url-bucket.sh DATASET
from the root of the repo,
where DATASET
is the name of the directory in dataset_config
.App Engine default service account
-> Create Key -> CREATE.dataset_config/DATASET/private-key.json
Every commit on a remote branch kicks off all tests on CircleCI.
API server unit tests use pytest and tox. To run locally:
virtualenv ~/virtualenv/tox
source ~/virtualenv/tox/bin/activate
pip install tox
cd api && tox -e py35
End-to-end tests use Puppeteer and jest-puppeteer. To run locally:
# Optional: ensure the elasticsearch index is clean
docker-compose up --build -d elasticsearch
curl -XDELETE localhost:9200/_all
# Start the rest of the services
docker-compose up --build
cd ui && npm test
Troubleshooting tips for end-to-end tests:
npm test -- -t Participant
ui/
is formatted with Prettier. husky is used to automatically format files upon commit. To fix formatting, in ui/
run npm run fix
.
Python files are formatted with YAPF.