ga4gh / htsget-refserver

Reference server implementation for the GA4GH HTSget API standard.
Apache License 2.0
11 stars 4 forks source link
htsget

Logo

htsget Reference Server

License Go Report Travis (.org) branch Coveralls github

Reference server implementation of the htsget API protocol for securely streaming genomic data. For more information about htsget, see the paper or specification.

A GA4GH-hosted instance of this server is running at https://htsget.ga4gh.org/. To use, see the OpenAPI documentation.

Quickstart - Docker

We suggest running the reference server as a docker container, as the image comes pre-installed with all dependencies.

With docker installed, run:

docker image pull ga4gh/htsget-refserver:${TAG}

to pull the image, and:

docker container run -d -p 3000:3000 ga4gh/htsget-refserver:${TAG}

to spin up a containerized server. Custom config files can also be passed to the application by first mounting the directory containing the config, and specifying the path to config in the run command:

docker container run -d -p ${PORT}:${PORT} -v /directory/to/config:/usr/src/app/config ga4gh/htsget-refserver:${TAG} ./htsget-refserver -config /usr/src/app/config/config.json

Additional BAM/CRAM/VCF/BCF directories you wish to serve via htsget can also be mounted into the container. See the Configuration section below for instructions on how to serve custom datasets.

The full list of tags/versions is available on the dockerhub repository page.

Setup - Native

To run and/or develop the server natively on your OS, the following dependencies are required:

This project uses Go modules to manage packages and dependencies.

With the above dependencies installed, run:

git clone https://github.com/ga4gh/htsget-refserver.git
cd htsget-refserver

to clone and enter the repository, and:

go build -o ./htsget-refserver ./cmd

to build the application binary. To start, run:

./htsget-refserver

A custom config file can also be specified with -config:

./htsget-refserver -config /path/to/config.json

Configuration

The htsget web service can be configured with runtime parameters via a JSON config file, specified with -config. For example:

./htsget-refserver -config /path/to/config.json

Examples of valid JSON config files are available in this repository:

In the JSON file, the root object must have a single "htsgetConfig" property, containing all sub-properties. ie:

{
    "htsgetConfig": {}
}

Configuration - "props" object

Under the htsgetConfig property, the props object overrides application-wide settings. The following table indicates the attributes of props and what settings they affect.

Name Description Default Value
port the port on which the service will run 3000
host web service hostname. The JSON ticket returned by the server will reference other endpoints, using this hostname/base url to provide a complete url. http://localhost:3000/
docsDir path to static file directory containing server documentation (e.g. OpenAPI). the server will serve its contents at the /docs/ endpoint NONE
tempDir writes temporary files used in request processing to this directory .
logFile writes application logs to this file htsget-refserver.log
corsAllowedOrigins CORS allow client from origins. Use comma to separate for multiple origins. http://localhost
corsAllowedMethods CORS allow methods. GET, POST, PUT, DELETE, OPTIONS
corsAllowedHeaders CORS allow headers. *
corsAllowCredentials CORS allow credentials. false
corsMaxAge CORS max age in seconds. 300
awsAssumeRole Turn on awsAssumeRole middleware. See Private Bucket section below. false

Example props object:

{
    "htsget": {
        "props": {
            "port": "80",
            "host": "https://htsget.ga4gh.org/",
            "tempdir": "/tmp/",
            "logfile": "/usr/src/app/htsget-refserver.log",
            "corsAllowedOrigins": "https://portal.ga4gh.org, http://intranet.ga4gh.org",
        }
    }
}

Configuration - "reads" object

Under the htsgetConfig property, the reads object overrides settings for reads-related data and endpoints. The following properties can be set:

Example reads object:

{
    "htsgetConfig": {
        "reads": {
            "enabled": true,
            "dataSourceRegistry": {
                "sources": [
                    {
                        "pattern": "^tabulamuris\\.(?P<accession>10X.*)$",
                        "path": "https://s3.amazonaws.com/czbiohub-tabula-muris/10x_bam_files/{accession}_possorted_genome.bam"
                    },
                    {
                        "pattern": "^tabulamuris\\.(?P<accession>.*)$",
                        "path": "https://s3.amazonaws.com/czbiohub-tabula-muris/facs_bam_files/{accession}.mus.Aligned.out.sorted.bam"
                    }
                ]
            }
            "serviceInfo": {
                "id": "demo.reads",
                "name": "htsget demo reads",
                "description": "serve alignment data via htsget",
                "organization": {
                    "name": "Example Org",
                    "url": "https://exampleorg.com"
                },
                "contactUrl": "mailto:nobody@exampleorg.com",
                "documentationUrl": "https://htsget.exampleorg.com/docs",
                "createdAt": "2021-01-01T09:00:00Z",
                "updatedAt": "2021-01-01T09:00:00Z",
                "environment": "test",
                "version": "1.0.0"
            }
        }
    }
}

Configuration - "variants" object

Under the htsgetConfig property, the variants object overrides settings for variants-related data and endpoints. The following properties can be set:

Example variants object:

{
    "htsgetConfig": {
        "variants": {
            "enabled": true,
            "dataSourceRegistry": {
                "sources": [
                    {
                        "pattern": "^1000genomes\\.(?P<accession>.*)$",
                        "path": "https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/analysis_results/integrated_call_sets/{accession}.vcf.gz"
                    }
                ]
            }
            "serviceInfo": {
                "id": "demo.variants",
                "name": "htsget demo variants",
                "description": "serve variant data via htsget",
                "organization": {
                    "name": "Example Org",
                    "url": "https://exampleorg.com"
                },
                "contactUrl": "mailto:nobody@exampleorg.com",
                "documentationUrl": "https://htsget.exampleorg.com/docs",
                "createdAt": "2021-01-01T09:00:00Z",
                "updatedAt": "2021-01-01T09:00:00Z",
                "environment": "test",
                "version": "1.0.0"
            }
        }
    }
}

Private Bucket

Say, you have data in private bucket as follows:

s3://my-primary-data-prod/Project/PID00115/WGS/PID00115-final.bam

Example configuration:

{
  "htsgetConfig": {
    "props": {
      ...
      "awsAssumeRole": true
    },
    "reads": {
      ...
      "dataSourceRegistry": {
        "sources": [
          ...
          {
            "pattern": "^my-primary-data(?P<accession>.*)$",
            "path": "s3://my-primary-data{accession}"
          }
        ]
      },
      "serviceInfo": {
        ...
      }
    },
    "variants": {
      ...
      "dataSourceRegistry": {
        "sources": [
          ...
          {
            "pattern": "^my-primary-data(?P<accession>.*)$",
            "path": "s3://my-primary-data{accession}"
          }
        ]
      },
      "serviceInfo": {
        ...
      }
    }
  }
}

Then you can call Htsget as follows:

curl -s http://localhost:3000/reads/my-primary-data-prod/Project/PID00115/WGS/PID00115-final.bam | jq

Testing

To execute unit and end-to-end tests on the entire package, run:

go test ./... -coverprofile=cp.out

The go coverage report will be available at ./cp.out. To execute tests for a specific package (for example the htsrequest package) run:

go test ./internal/htsrequest -coverprofile=cp.out

Changelog

v1.5.0

v1.4.0

v1.3.0

v1.2.0

v1.1.0

v1.0.0

Roadmap

Maintainers

Issues

Bugs and issues can be submitted via the Github Issue Tracker