ga4gh-discovery / data-connect

Standard for describing and searching biomedical data developed by the Global Alliance for Genomics & Health.
Apache License 2.0
24 stars 14 forks source link

Set rules for gzip content encoding that's friendly to tables-in-a-bucket #108

Open jfuerth opened 3 years ago

jfuerth commented 3 years ago

This issue was raised in a discussion on #98.

When data is stored as (potentially paginated) Search Tables, not only can the JSON files be gzip compressed at rest, but they can be served as-is with a content-encoding: gzip header. This adds no burden to the web server, and minimal burden to the client receiving the data.

The vast majority of real-world HTTP clients understand content-encoding: gzip and transparently decompress data as it is received. However, according to the HTTP specification, we should only serve gzip-compressed data if the client requests it with an accept-encoding: gzip header.

If we specify that Search clients MUST be capable of dealing with content-encoding: gzip and MUST send an accept-encoding: gzip header, this will make it easy for tables-in-a-bucket Search implementations to serve compressed responses, speeding up transfers and reducing storage costs.

Notes on cloud support:

ifokkema commented 3 years ago

Sounds great to me!