This handles the uploaded summary statistics files, validates them, reports errors to the deposition app and puts valid files in the queue for sumstats file harmonisation and HDF5 loading.
POST
and GET
requests via the endpoints below. Celery worker(s) perform the validation tasks in the background. They can work from anywhere the app is installed and can see the RabbitMQ queue. brew install libmagic
)git clone https://github.com/EBISPOT/gwas-sumstats-service.git
cd gwas-sumstats-service
virtualenv --python=python3.6 .env
source activate .env/bin/activate
pip install .
pip install -r requirements.txt
docker-compose up
.rm -rf .tox
tox
BROKER_PORT
) specified in the config e.g.
rabbitmq-server
gwas-sumstats-service
:gunicorn -b 0.0.0.0:8000 sumstats_service.app:app --log-level=debug
gwas-sumstats-service
:celery -A sumstats_service.app.celery worker --loglevel=debug --queues=postval
gwas-sumstats-service
:celery -A sumstats_service.app.celery worker --loglevel=debug --queues=preval
This section guides you through using Docker-compose to set up and run the gwas-sumstats-service
with all necessary services, including Flask, RabbitMQ, Celery, and MongoDB.
git clone [repository-url]
Replace the local Dockerfile and docker-compose file with Dockerfile
and docker-compose.yaml
, respectively.
Build the Docker Containers
Navigate to the cloned directory and build the Docker containers:
docker-compose build
Start the Docker Containers
Spin up the Flask, RabbitMQ, Celery, and MongoDB containers:
docker-compose up
CONTAINERISE
environment variable to adapt the application's behavior accordingly if you require Singularity.To debug locally using Docker, update the Dockerfile and local executor configurations in the config file as follows.
...
NEXTFLOW_CONFIG = (
# "executor.name = 'slurm'\n"
# "process.executor = 'slurm'\n"
"executor.name = 'local'\n"
...
helm install --name rabbitmq --namespace rabbitmq --set rabbitmq.username=<user>,service.type=NodePort,service.nodePort=<port> stable/rabbitmq
kubectl --kubeconfig=<path to config> -n <namespace> create secret generic ssh-keys --from-file=id_rsa=<path/to/id_rsa> --from-file=id_rsa.pub=/path/to/id_rsa.pub> --from-file=known_hosts=/path/to/known_hosts
kubectl --kubeconfig=<path to config> -n gwas create secret generic globus --from-file=refresh-tokens.json=<path/to/refresh-tokens.json>
helm install --name gwas-sumstats k8chart/ --wait
docker run -it -d --name sumstats -v /path/to/data/:$INSTALL_PATH/sumstats_service/data -e CELERY_USER=<user> -e CELERY_PASSWORD=<pwd> -e QUEUE_HOST=<host ip> -e QUEUE_PORT=<port> gwas-sumstats-service:latest /bin/bash
docker exec sumstats celery -A sumstats_service.app.celery worker --loglevel=debug --queues=preval
This section provides instructions on how to test the gwas-sumstats-service
using Postman. The Postman collection for this service includes requests for submitting summary statistics and retrieving their validation status. Please find the collection here.
gwas-sumstats-service
(ID: e03dcb59-01cb-411b-a8d0-b216e2860c9f) into your Postman application.Submit Summary Statistics
POST {{protocol}}://{{host}}:{{port}}/v1/sum-stats
request to submit summary statistics.id
field in the request body with a unique identifier. Example body for a valid file submission:
{
"requestEntries": [
{
"id": "{{callbackId}}",
"filePath": "test_sumstats_file.tsv",
"md5": "9b5f307016408b70cde2c9342648aa9b",
"assembly": "GRCh38",
"readme": "optional text",
"entryUUID": "ABC1234",
"minrows": "2"
}
]
}
filePath
and other relevant fields accordingly.callbackID
from the response for the next step.Retrieve Validation Status
GET {{protocol}}://{{host}}:{{port}}/v1/sum-stats/<callbackID>
request to retrieve the status of your submission.<callbackID>
with the ID obtained from the previous POST request.root@container-id:/sumstats_service# ls depo_ss_validated/<callbackID>/
nextflow.log
for detailed execution logs:
root@container-id:/sumstats_service# cat depo_ss_validated/<callbackID>/logs/nextflow.log
POST sum-stats
for submission and GET sum-stats
for status retrieval.{{protocol}}
, {{host}}
, and {{port}}
are pre-defined in the collection for ease of use.curl -i -H "Content-Type: application/json" -X POST -d '{"requestEntries":[{"id":"abc123","filePath":"https://raw.githubusercontent.com/EBISPOT/gwas-sumstats-service/master/tests/test_sumstats_file.tsv","md5":"a1195761f082f8cbc2f5a560743077cc","assembly":"GRCh38", "readme":"optional text", "entryUUID": "globusdir"},{"id":"bcd234","filePath":"https://raw.githubusercontent.com/EBISPOT/gwas-sumstats-service/master/tests/test_sumstats_file.tsv","md5":"a1195761f082f8cbc","assembly":"GRCh38", "entryUUID": "globusdir"}]}' http://localhost:8000/v1/sum-stats
HTTP/1.0 201 CREATED
Content-Type: application/json
Content-Length: 26
Server: Werkzeug/0.15.4 Python/3.6.5
Date: Wed, 17 Jul 2019 15:15:23 GMT
{"callbackID": "TiQS2yxV"}
curl http://localhost:8000/v1/sum-stats/TiQS2yxV
{
"callbackID": "TiQS2yxV",
"completed": false,
"statusList": [
{
"id": "abc123",
"status": "VALID",
"error": null
},
{
"id": "bcd234",
"status": "INVALID",
"error": "md5sum did not match the one provided"
}
]
}
Follow these steps to set up FormatLint:
Create a new virtual environment for the project to manage dependencies separately from your global Python setup:
python -m venv formatlint
Activate the virtual environment:
source formatlint/bin/activate
Install the required Python packages:
pip install -r requirements.dev.txt
Execute the formatting and linting script:
./format-lint