EGA-archive / beacon2-ri-api

Beacon v2 Reference Implementation (API)
Apache License 2.0
16 stars 38 forks source link

Join queries #97

Open mustafahsyed opened 1 year ago

mustafahsyed commented 1 year ago

Description

Perform a join query between “individual” and “biosample” or any other two or more entity

Proposed solution

New API endpoint to support queries involving multiple entities.

Definition of Done

Availability of new API endpoint

mustafahsyed commented 9 months ago

Hi @costero-e

Please let me know if join queries are possible.

Cheers Mustafa

costero-e commented 9 months ago

Hi @mustafahsyed, by join queries do you mean aggregating data from two endpoints and return it in a single record or applying filters to multiple endpoints at once? We are planning to develop multiple endpoints filters, e.g. show me variants for a specific position for individuals that are male (specifications do accept this), but we are not planning to return data aggregated from different endpoints.

mustafahsyed commented 7 months ago

Hi @costero-e Ideally both filtering and aggregating data but we can begin by just having option to filter data using multiple endpoints. Please let me know when such join multiple endpoints filters will be available? Excellent work! Thanks

costero-e commented 7 months ago

Hi @mustafahsyed, today I have finished implementing "cross queries", which can be used to apply a filter to a collection not belonging to the final collection you want to get the documents from. Here is one example you can try:

curl \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "meta": {
        "apiVersion": "2.0"
    },
    "query":{ "requestParameters": {
        },
        "filters": [
{"id":"NCIT:C20197", "scope":"individual"} ],
        "includeResultsetResponses": "HIT",
        "pagination": {
            "skip": 0,
            "limit": 0 
        },
        "testMode": false,
        "requestedGranularity": "count"
    }
}' \
  http://localhost:5050/api/g_variants

Aggregating data is not in our roadmap though right now, as it is not part of the spec. Let me know if this new implementation fits your needs or is what you expected. Best, Oriol

mustafahsyed commented 5 months ago

@costero-e Join query now works great! Appreciate adding this very useful update.

One more related question: If I like to pass mutiple filter IDs for the same data element, eg., query diseases.diseaseCode NCIT:C3270 or NCIT:C16576. How should my payload should look like?

My payload for the post query looks like this, but it does not works:

{"query": { "filters": [{"id":"NCIT:C3270,NCIT:C16576"}], "includeResultsetResponses": "HIT", "pagination": { "skip": 0, "limit": 10000 }, "testMode": "false", "requestedGranularity": "record" } } Cheers Mustafa

costero-e commented 5 months ago

Hi @mustafahsyed , I'm glad the hear the appreciation, thanks. About the payload for multiple filters, taking into consideration that both filters are present in the filtering terms endpoint, then it should look like this:

curl \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "meta": {
        "apiVersion": "2.0"
    },
    "query": {
        "requestParameters": {
        },
        "filters": [
{"id": "NCIT:C16576"}, {"id": "NCIT:C3270"}],
        "includeResultsetResponses": "HIT",
        "pagination": {
            "skip": 0,
            "limit": 1000
        },
        "testMode": false,
        "requestedGranularity": "record"
    }
}' \
  http://localhost:5050/api/individuals

If you want to point more specifically which is the scope the filtering term is applying to, or apply a join query, then you can do like this:

curl \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "meta": {
        "apiVersion": "2.0"
    },
    "query": {
        "requestParameters": {
        },
        "filters": [
{"id": "NCIT:C16576", "scope": "individual"}, {"id": "NCIT:C3270", "scope": "biosample"}],
        "includeResultsetResponses": "HIT",
        "pagination": {
            "skip": 0,
            "limit": 1000
        },
        "testMode": false,
        "requestedGranularity": "record"
    }
}' \
  http://localhost:5050/api/analyses

Best,

Oriol

mustafahsyed commented 5 months ago

@costero-e Above payload pulls any record with both terms, NCIT:C3270 and NCIT:C16576. What I am looking for is all records with diseaseCode either NCIT:C3270 or NCIT:C16576.

Payload you suggested above performs "AND" operation, I am looking for "OR" operation. Please let me know how I can have an OR query.

costero-e commented 5 months ago

Hi @mustafahsyed , this is still not in the standards but I know there is a plan for having it included in the short future. Once it is accepted, I will develop "OR" operations for RI as well and I will let you know.

Best, Oriol