Data Connect is a standard for discovery and search of biomedical data, developed by the Discovery Work Stream of the Global Alliance for Genomics & Health.
The standard provides a mechanism for:
It is not in the scope of the standard to:
For more information:
GA4GH has previously developed two standards for discovery. Beacon
is a standard for discovery of genomic variants, while Matchmaker
is a standard for discovery of subjects with certain genomic and phenotypic features. Implementations of these standards have been linked into federated networks (e.g. Beacon Network and Matchmaker Exchange).
Both standards (and the corresponding networks) have been successful in their own right, but had a lot in common. It was acknowledged that it would be broadly useful to develop standards that abstract common infrastructure for building searchable, federated networks for a variety of applications in genomics and health.
Data Connect, formerly known as GA4GH Search, is this general-purpose middleware for building federated, search-based applications. The name of the API reflects its purpose of:
The intended audience of this standard includes:
Data Connect is an intentionally general-purpose middleware meant to enable the development of a diverse ecosystem of applications.
The community has built versions of the following applications on top of Data Connect:
We're looking forward to seeing things we haven’t yet imagined!
The community has also connected data through the following data sources:
Examples of queries on the data that can be answered via Data Connect include:
Full summary of use cases can be found in USECASES.md.
Several open-source implementations are available:
The specification allows for a no-code implementation as a collection of files served statically (e.g. in a cloud bucket or a Git repository). To do this, you need the following JSON files:
tables
: served in response to GET /tables
table/{table_name}/info
: served in response to GET /table/{table_name}/info
. e.g. a table with the name mytable
should have a corresponding file table/mytable/info
table/{table_name}/data
: served in response to GET /table/{table_name}/data
. e.g. a table with the name mytable
should have a corresponding file table/mytable/data
table/{table_name}/data_{pageNumber}
, which will be linked in the next_page_url
of the first table (e.g. mytable
).
/table/{table_name}/data
, then you can use any naming scheme you like for subsequent pages.table/{table_name}/data_models/{schemaFile}
A concrete, example test implementation is available here.
A Google Sheets spreadsheet can also be exposed via the Tables API using the sheets adapter, located here.
DNAstack has provided an implementation of Data Connect on top of Trino. This implementation includes examples of data stored in the FHIR and Phenopackets formats.
Several open-source implementations based on different technology stacks are available:
Sensitive information transmitted over public networks, such as access tokens and human genomic data, MUST be protected using Transport Level Security (TLS) version 1.2 or later, as specified in RFC 5246.
If the data holder requires client authentication and/or authorization, then the client’s HTTPS API request MUST present an OAuth 2.0 bearer access token as specified in RFC 6750, in the Authorization
request header field with the Bearer authentication scheme:
Authorization: Bearer [access_token]
The policies and processes used to perform user authentication and authorization, and the means through which access tokens are issued, are beyond the scope of this API specification. GA4GH recommends the use of the OpenID Connect and OAuth 2.0 framework (RFC 6749) for authentication and authorization.
A stand-alone security review has been performed on the API. Nevertheless, GA4GH cannot guarantee the security of any implementation to which the API documentation links. If you integrate this code into your application it is AT YOUR OWN RISK AND RESPONSIBILITY to arrange for an audit to ensure compliance with any applicable regulatory and security requirements, especially where personal data may be at issue.
To report security issues with the specification, please send an email to security-notification@ga4gh.org.
Cross-origin resource sharing (CORS) is an essential technique used to overcome the same origin content policy seen in browsers. This policy restricts a webpage from making a request to another website and leaking potentially sensitive information. However the same origin policy is a barrier to using open APIs. GA4GH open API implementers should enable CORS to an acceptable level as defined by their internal policy. All public API implementations should allow requests from any server.
GA4GH has provided a CORS best practices document, which implementers should refer to for guidance when enabling CORS on public API instances.
The API is specified in OpenAPI 3. Use Swagger Validator Badge to validate the YAML file, or its OAS Validator wrapper.
Documentation is sourced from the hugo/
directory. Building the docs requires the Hugo framework with the Clyde theme. Edit the markdown files under hugo/content/
for content changes.
Run the docs locally using make run
, which is served at http://localhost:1313/data-connect/
. Clean up before commiting using make clean
.
To manually inspect the build artifacts, use make build
. Clean up before commiting using make clean
.
The GA4GH is an open community that strives for inclusivity. Guidelines for contributing to this repository are listed in CONTRIBUTING.md. Teleconferences and corresponding meeting minutes are open to the public. To learn how to contribute to this effort, please contact us.