EGA-archive / beacon2-ri-api

Beacon v2 Reference Implementation (API)
Apache License 2.0
16 stars 38 forks source link

steps to set up ri-api with own data #300

Closed albodrug closed 5 months ago

albodrug commented 5 months ago

Hello,

I have followed the beaconv2-ri-tools documentation to obtain the 7 json files making up the BFF format. Loading data into the beaconv2-ri-api worked fine. I can see the data in the mongo express ui. However, all my api requests return no result. I have modified the datasets.yml and the cohorts.yml so my new sample id appear to be looked up by g_variants.py script. Could you give me hints on what I am missing in order to make my data searchable by the ri-api?

Here is the return of a post request:

$ curl   -H 'Content-Type: application/json'   -X POST   -d '{
    "meta": {
        "apiVersion": "2.0"
    },
    "query": {
        "requestParameters": {
    "alternateBases": "G" ,
    "referenceBases": "A" ,
"start": [ 0 ],
            "end": [ 1600000000 ],
            "variantType": "SNP"
        },
        "filters": [],
        "includeResultsetResponses": "HIT",
        "pagination": {
            "skip": 0,
            "limit": 1
        },
        "testMode": false,
        "requestedGranularity": "record"
    }
}
'   http://localhost:5050/api/g_variants ; echo
{"meta":{"beaconId":"org.ega-archive.ga4gh-approval-beacon-test","apiVersion":"v2.0.0","returnedGranularity":"record","receivedRequestSummary":{"apiVersion":"2.0","requestedSchemas":[],"filters":[],"requestParameters":{"alternateBases":"G","referenceBases":"A","start":[0],"end":[1600000000],"variantType":"SNP"},"includeResultsetResponses":"HIT","pagination":{"skip":0,"limit":1},"requestedGranularity":"record","testMode":false},"returnedSchemas":[{"entityType":"genomicVariation","schema":"beacon-g_variant-v2.0.0"}]},"responseSummary":{"exists":false},"response":{"resultSets":[]},"beaconHandovers":[{"handoverType":{"id":"CUSTOM:000001","label":"Project description"},"note":"Project description","url":"https://www.nist.gov/programs-projects/genome-bottle"}]}

The docker logs:

beacon             | [beacon.request.handlers][DEBUG ] (L54) meta=RequestMeta(requested_schemas=[], api_version='2.0') query=RequestQuery(filters=[], include_resultset_responses=<IncludeResultsetResponses.HIT: 'HIT'>, pagination=Pagination(skip=0, limit=1), request_parameters={'alternateBases': 'G', 'referenceBases': 'A', 'start': [0], 'end': [1600000000], 'variantType': 'SNP'}, test_mode=False, requested_granularity=<Granularity.RECORD: 'record'>)
beacon-permissions | public
beacon-permissions | visa_datasets: []
beacon-permissions | ['CINECA_dataset', 'rd-connect_dataset', 'CINECA_synthetic_cohort_EUROPE_UK1', 'AV_Dataset']
beacon             | [beacon.request.handlers][ INFO ] (L79) resolved datasets:  ['CINECA_dataset', 'rd-connect_dataset', 'CINECA_synthetic_cohort_EUROPE_UK1', 'AV_Dataset']
beacon             | [beacon.db.utils][DEBUG ] (L77) FINAL QUERY: {}
beacon             | [beacon.db.g_variants][ INFO ] (L129) GET_VARIANTS YOUYOU
beacon             | [beacon.db.utils][DEBUG ] (L77) FINAL QUERY: {'$and': [{'variation.alternateBases': {'$eq': 'G'}}, {'variation.referenceBases': {'$eq': 'A'}}, {'variation.location.interval.start.value': {'$gte': 0}}, {'variation.location.interval.end.value': {'$lte': 1600000000}}, {'variation.variantType': {'$eq': 'SNP'}}], '$or': [{'caseLevelData.biosampleId': '“MySample1”'}, {'caseLevelData.biosampleId': '”MySample2”'}, {'caseLevelData.biosampleId': '”MySample3”'}, {'caseLevelData.biosampleId': '”MySample4”'}, {'caseLevelData.biosampleId': '”MySample5”'}]}
beacon             | [beacon.response.build_response][DEBUG ] (L67) 0
beacon             | [beacon.utils.stream][DEBUG ] (L25) HTTP response stream
beacon             | [beacon.utils.stream][DEBUG ] (L30) Partial content: False
beacon             | [aiohttp.access][ INFO ] (L206) 192.168.112.1 [25/Mar/2024:15:59:28 +0000] "POST /api/g_variants HTTP/1.1" 200 985 "-" "curl/7.81.0"`

Thanks for any tips! Alex

costero-e commented 5 months ago

Hi @albodrug, thank you for trying RI. First of all, I see you don't have the last version of RI, and although it shouldn't be the cause of your problems, you can update it if you wish to have the last "working version". Secondly, I see something strange in the final query you just threw, which are the biosampleId:

{'caseLevelData.biosampleId': '“MySample1”'}, {'caseLevelData.biosampleId': '”MySample2”'}, {'caseLevelData.biosampleId': '”MySample3”'}, {'caseLevelData.biosampleId': '”MySample4”'}, {'caseLevelData.biosampleId': '”MySample5”'}

As you say, you have to add the biosampleIds and individualIds to cohorts.yml and datasets.yml, and right now beacon is reading your ids like ”MySample4”, are you also adding them like this? The string has to be the same!

Also, if you can make beacon read the biosampleId like a plain MySample4 it would be better for your beacon comprehension.

Tell me if this was the cause of your beacon not displaying any results.

Thanks, Oriol

albodrug commented 5 months ago

Hi Oriol,

Many thanks! I have updated the ri. And the quotes were indeed the issue.

My raw files and the biosamples.json do no carry the quotes. The issue was resolved when I modified the .yml files deleteting the quotes from the biosampleId, but not from the individualId. My individualId and biosampleId are different (contrary to the CINECA data), do you think it may be related? Also in both cohorts.yml and datasets.yml files I have added both individualId and biosampleId.

costero-e commented 5 months ago

Great, I'm glad this solved your problem. Indeed, it is related. For some collection, beacon RI will read the biosampleId and for others the individualId. If you don't add them all properly in cohorts.yml and datasets.yml the RI will not be able to relate the samples correctly to the dataset they belong.