ga4gh-discovery / data-connect

Standard for describing and searching biomedical data developed by the Global Alliance for Genomics & Health.
Apache License 2.0
24 stars 14 forks source link

Data Connect API Swagger Validator

Data Connect is a standard for discovery and search of biomedical data, developed by the Discovery Work Stream of the Global Alliance for Genomics & Health.

The standard provides a mechanism for:

It is not in the scope of the standard to:

For more information:

Background

GA4GH has previously developed two standards for discovery. Beacon is a standard for discovery of genomic variants, while Matchmaker is a standard for discovery of subjects with certain genomic and phenotypic features. Implementations of these standards have been linked into federated networks (e.g. Beacon Network and Matchmaker Exchange).

Both standards (and the corresponding networks) have been successful in their own right, but had a lot in common. It was acknowledged that it would be broadly useful to develop standards that abstract common infrastructure for building searchable, federated networks for a variety of applications in genomics and health.

Data Connect, formerly known as GA4GH Search, is this general-purpose middleware for building federated, search-based applications. The name of the API reflects its purpose of:

Benefits

Intended Audience

The intended audience of this standard includes:

Use cases

Data Connect is an intentionally general-purpose middleware meant to enable the development of a diverse ecosystem of applications.

The community has built versions of the following applications on top of Data Connect:

We're looking forward to seeing things we haven’t yet imagined!

The community has also connected data through the following data sources:

Examples of queries on the data that can be answered via Data Connect include:

Full summary of use cases can be found in USECASES.md.

Implementations

Server implementations

Several open-source implementations are available:

Tables-in-a-bucket (no-code implementation)

The specification allows for a no-code implementation as a collection of files served statically (e.g. in a cloud bucket or a Git repository). To do this, you need the following JSON files:

A concrete, example test implementation is available here.

Google Sheets implementation

A Google Sheets spreadsheet can also be exposed via the Tables API using the sheets adapter, located here.

Implementation based on Trino

DNAstack has provided an implementation of Data Connect on top of Trino. This implementation includes examples of data stored in the FHIR and Phenopackets formats.

Client implementations

Several open-source implementations based on different technology stacks are available:

Security

Sensitive information transmitted over public networks, such as access tokens and human genomic data, MUST be protected using Transport Level Security (TLS) version 1.2 or later, as specified in RFC 5246.

If the data holder requires client authentication and/or authorization, then the client’s HTTPS API request MUST present an OAuth 2.0 bearer access token as specified in RFC 6750, in the Authorization request header field with the Bearer authentication scheme:

Authorization: Bearer [access_token]

The policies and processes used to perform user authentication and authorization, and the means through which access tokens are issued, are beyond the scope of this API specification. GA4GH recommends the use of the OpenID Connect and OAuth 2.0 framework (RFC 6749) for authentication and authorization.

A stand-alone security review has been performed on the API. Nevertheless, GA4GH cannot guarantee the security of any implementation to which the API documentation links. If you integrate this code into your application it is AT YOUR OWN RISK AND RESPONSIBILITY to arrange for an audit to ensure compliance with any applicable regulatory and security requirements, especially where personal data may be at issue.

To report security issues with the specification, please send an email to security-notification@ga4gh.org.

CORS

Cross-origin resource sharing (CORS) is an essential technique used to overcome the same origin content policy seen in browsers. This policy restricts a webpage from making a request to another website and leaking potentially sensitive information. However the same origin policy is a barrier to using open APIs. GA4GH open API implementers should enable CORS to an acceptable level as defined by their internal policy. All public API implementations should allow requests from any server.

GA4GH has provided a CORS best practices document, which implementers should refer to for guidance when enabling CORS on public API instances.

Development

Validating

The API is specified in OpenAPI 3. Use Swagger Validator Badge to validate the YAML file, or its OAS Validator wrapper.

Documentation

Documentation is sourced from the hugo/ directory. Building the docs requires the Hugo framework with the Clyde theme. Edit the markdown files under hugo/content/ for content changes.

Run the docs locally using make run, which is served at http://localhost:1313/data-connect/. Clean up before commiting using make clean.

To manually inspect the build artifacts, use make build. Clean up before commiting using make clean.

Contributing

The GA4GH is an open community that strives for inclusivity. Guidelines for contributing to this repository are listed in CONTRIBUTING.md. Teleconferences and corresponding meeting minutes are open to the public. To learn how to contribute to this effort, please contact us.