ga4gh / refget

GA4GH Refget specifications docs
https://ga4gh.github.io/refget
14 stars 7 forks source link

Should there be endpoints to test existence in database? #27

Open nsheff opened 3 years ago

nsheff commented 3 years ago

As I was implementing the /collection and /comparison endpoints we discussed, I thought of a few other possible uses. I would like to know if people think these should be part of the spec.

Overview

Given a POST request the service could notify if this collection is present in the database, and at what level.

Level 1 input

If input is a level 1 representation, INPUT looks like this:

{
  "lengths": "4925cdbd780a71e332d13145141863c1",
  "names": "ce04be1226e56f48da55b6c130d45b94",
  "sequences": "3b379221b4d6ea26da26cec571e5911c"
}

And response looks like:

  "exists": {
    "0": "true",
    "1": {
      "lengths": "true",
      "names": "true",
      "sequences": "true"
    }
}

Level 2 input

To do this for a level 2 representation, input is:

{
  "lengths": [
    "1216",
    "970",
    "1788"
  ],
  "names": [
    "A",
    "B",
    "C"
  ],
  "sequences": [
    "76f9f3315fa4b831e93c36cd88196480",
    "d5171e863a3d8f832f0559235987b1e5",
    "b9b1baaa7abf206f6b70cf31654172db"
  ]
}

Response:

{
  "digests": {
    "0": "a6748aa0f6a1e165f871dbed5e54ba62",
    "1": {
      "lengths": "4925cdbd780a71e332d13145141863c1",
      "names": "ce04be1226e56f48da55b6c130d45b94",
      "sequences": "3b379221b4d6ea26da26cec571e5911c"
    },
  "exists": {
    "0": "true",
    "1": {
      "lengths": "true",
      "names": "true",
      "sequences": "true"
    }
}

For this to work with level 2 inputs, we have to compute the digests on the server. That means this endpoint could actually be used as a digest computing service for local collections. Thoughts?

This is starting to get into the idea of the search function, but this is actually much simpler than the search function we envisioned.

sveinugu commented 2 years ago

@nsheff Seems useful. However, I believe the more common query would be to retrieve the level 0 digests of all the seqcols that contain the submitted arrays, not just information that at least one such seqcol exists. Would this be the search feature you mentioned (I don't remember...)?

nsheff commented 2 years ago

Would this be the search feature you mentioned (I don't remember...)?

Yes.