Specify an optional page index to help clients get data more efficiently

ga4gh-discovery / data-connect

Standard for describing and searching biomedical data developed by the Global Alliance for Genomics & Health.

Apache License 2.0

24 stars 14 forks source link

"index": { "ordered_by": [ "id" ], "pages": [ { "url": "https://storage.googleapis.com/ga4gh-phenopackets-example/flat/table/hpo_phenopackets/data_1", "partitions": { "id": { "min": "PMID:27435956-Naz_Villalba-2016-NLRP3-proband", "max": "PMID:29174093-Szczałuba-2018-GNB1-proband" } } }, { "url": "https://storage.googleapis.com/ga4gh-phenopackets-example/flat/table/hpo_phenopackets/data_2", "partitions": { "id": { "min": "PMID:26833330-Jansen-2016-TMEM199-F1-II2", "max": "PMID:27974811-Haliloglu-2017-PIEZO2-Patient" } } } ] } }

This looks great! It'll allow me to indicate how to skip to other chromosomes so that clients don't need to paginate through the entire data. Three suggestions and one question:

Indicate that data producers may provide additional pages not represented in the index (e.g. my index will contain various chromosomes, but all chromosomes can still be paginated further)

Related: make partitions optional or at least the min/max so that producers don't need to pre-sort their data to figure out all pages before. To still indicate what the URL points to, you could allow for something like (using your example data):

"index": [
{
"ordered_by": [ "id" ],
"pages": [
  {
    "url": "https://storage.googleapis.com/ga4gh-phenopackets-example/flat/table/hpo_phenopackets/data_1",
    "id": "PMID:26833330-Jansen-2016-TMEM199-F1-II2"
  },
  {
    "url": "https://storage.googleapis.com/ga4gh-phenopackets-example/flat/table/hpo_phenopackets/data_2",
    "id": "PMID:27435956-Naz_Villalba-2016-NLRP3-proband"
  },
  {
    "url": "https://storage.googleapis.com/ga4gh-phenopackets-example/flat/table/hpo_phenopackets/data_3",
    "id": "PMID:27974811-Haliloglu-2017-PIEZO2-Patient"
  }
]
}
]

For me, id would be chromosome, which would probably make more sense as an example.

Make index an array of objects rather than an object, allowing for multiple sorting schemes. Clients not interested in the sorting may then just pick the first index, while those interested in sorting can choose from the options provided. Not needed for me, but this makes the format more flexible for others.

And a question: your example (likely just because of the mockup) provides min/max values that don't seem to be sorted; they should be, right?

ga4gh-discovery / data-connect

Specify an optional page index to help clients get data more efficiently #109