Closed AlexCork1 closed 5 months ago
Hi @AlexCork1 , thank you for your report. First of all, please, can you update your beacon to the last version? This issue may be related to a bug coming from an old version of the api. Secondly, the CINECA dataset should return 1271 results, indeed, that's what should appear in your numTotalResults if you only have one dataset. Remember to execute the script reindex.py after you inject all the data (maybe error comes from there). And thirdly, your query should look like this:
curl \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"meta": {
"apiVersion": "2.0"
},
"query":{ "requestParameters": {
},
"filters": [
{"id":"NCIT:C16576", "scope":"individual"} ],
"includeResultsetResponses": "HIT",
"pagination": {
"skip": 0,
"limit": 10
},
"testMode": false,
"requestedGranularity": "record"
}
}' \
http://localhost:5050/api/individuals
Please, try again this updating the beacon container with last version of GH master branch and tell me if this solved the issue for you.
Thanks,
Oriol
Hello,
I have the same issue when sending calls to biosamples or individuals. The API always returns 15 in numTotalResults.
I have a test dataset of 5 biosamples and 5 individuals only, so I knew it was not possible to get 15...
I went looking through the functions and it looks like in beacon/db/utils.py function get_count() there is an exception to return 15 when it can't count (lines 72 to 74 in utils.py).
I tried to debug it but I can't figure out why exactly the count does not work. I would also appreciate any input!
I saw no issues with calls to g_variants : )
beacon | [beacon.request.handlers][DEBUG ] (L53) 10
beacon | [beacon.request.handlers][DEBUG ] (L54) meta=RequestMeta(requested_schemas=[], api_version='v2.0.0') query=RequestQuery(filters=[], include_resultset_responses=<IncludeResultsetResponses.HIT: 'HIT'>, pagination=Pagination(skip=0, limit=10), request_parameters={'filters': 'NCIT:C16576'}, test_mode=False, requested_granularity=<Granularity.RECORD: 'record'>, scope=None)
beacon | [beacon.request.handlers][DEBUG ] (L59) None
beacon | [beacon.request.handlers][DEBUG ] (L69) public
beacon-permissions | public
beacon-permissions | visa_datasets: []
beacon-permissions | ['ICAN_DATASET_3K']
beacon | [beacon.request.handlers][DEBUG ] (L75) ['ICAN_DATASET_3K']
beacon | [beacon.request.handlers][DEBUG ] (L76) public
beacon | [beacon.request.handlers][DEBUG ] (L78) all datasets: [['ICAN_DATASET_3K'], []]
beacon | [beacon.request.handlers][ INFO ] (L79) resolved datasets: ['ICAN_DATASET_3K']
beacon | [beacon.request.handlers][DEBUG ] (L80) True
beacon | [beacon.request.handlers][DEBUG ] (L81) []
beacon | [beacon.db.utils][DEBUG ] (L42) Returning estimated count
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.request.handlers][DEBUG ] (L107) ['ICAN_DATASET_3K']
beacon | [beacon.request.handlers][DEBUG ] (L110) []
beacon | [beacon.request.handlers][DEBUG ] (L111) ['ICAN_DATASET_3K']
beacon | [beacon.request.handlers][DEBUG ] (L146) ['ICAN_DATASET_3K']
beacon | [beacon.request.handlers][DEBUG ] (L149) ICAN_DATASET_3K
beacon | [beacon.db.individuals][ INFO ] (L208) {'_id': ObjectId('6616b767421532a7970d0b39'), 'biosampleId': 'B00GWDY', 'geographicOrigin': {'id': 'NCIT:C16592', 'label': 'France'}, 'id': 'i45716', 'sex': {'id': 'NCIT:C16576', 'label': 'female'}}
beacon | [beacon.db.individuals][ INFO ] (L208) {'_id': ObjectId('6616b767421532a7970d0b3a'), 'biosampleId': 'B00GWE2', 'geographicOrigin': {'id': 'NCIT:C16592', 'label': 'France'}, 'id': 'i46727', 'sex': {'id': 'NCIT:C20197', 'label': 'male'}}
beacon | [beacon.db.individuals][ INFO ] (L208) {'_id': ObjectId('6616b767421532a7970d0b3b'), 'biosampleId': 'B00GWE0', 'geographicOrigin': {'id': 'NCIT:C16592', 'label': 'France'}, 'id': 'i46385', 'sex': {'id': 'NCIT:C16576', 'label': 'female'}}
beacon | [beacon.db.individuals][ INFO ] (L208) {'_id': ObjectId('6616b767421532a7970d0b3c'), 'biosampleId': 'B00GWE1', 'geographicOrigin': {'id': 'NCIT:C16592', 'label': 'France'}, 'id': 'i46527', 'sex': {'id': 'NCIT:C20197', 'label': 'male'}}
beacon | [beacon.db.individuals][ INFO ] (L208) {'_id': ObjectId('6616b767421532a7970d0b3d'), 'biosampleId': 'B00GWDZ', 'geographicOrigin': {'id': 'NCIT:C16592', 'label': 'France'}, 'id': 'i42629', 'sex': {'id': 'NCIT:C16576', 'label': 'female'}}
beacon | [beacon.db.individuals][DEBUG ] (L211) {'$and': []}
beacon | [beacon.db.individuals][DEBUG ] (L212) True
beacon | [beacon.db.filters][DEBUG ] (L19) {'$and': []}
beacon | [beacon.db.filters][DEBUG ] (L22) {}
beacon | [beacon.db.filters][DEBUG ] (L256) {'$and': [{'id': 'NCIT:C16576'}, {'scope': None}]}
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {'$and': [{'id': 'NCIT:C16576'}, {'scope': None}]}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {'$and': [{'id': {'$regex': ''}}, {'scope': None}]}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.db.filters][DEBUG ] (L299) {'$or': [{'.id': 'NCIT:C16576'}]}
beacon | [beacon.db.filters][DEBUG ] (L256) {'$and': [{'id': 'NCIT:C16576'}, {'scope': None}]}
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {'$and': [{'id': 'NCIT:C16576'}, {'scope': None}]}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {'$and': [{'id': {'$regex': ''}}, {'scope': None}]}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.db.filters][DEBUG ] (L299) {'$or': [{'.id': 'NCIT:C16576'}]}
beacon | [beacon.db.filters][DEBUG ] (L149) {'$and': [{'$and': []}, {'$or': [{'.id': 'NCIT:C16576'}]}, {'$or': [{'.id': 'NCIT:C16576'}]}]}
beacon | [beacon.db.individuals][DEBUG ] (L33) Include Resultset Responses = HIT
beacon | [beacon.db.utils][DEBUG ] (L197) {'$and': [{'$and': []}, {'$or': [{'.id': 'NCIT:C16576'}]}, {'$or': [{'.id': 'NCIT:C16576'}]}]}
beacon | [beacon.db.utils][DEBUG ] (L198) 0
beacon | [beacon.db.utils][DEBUG ] (L216) {'$and': [{'$and': []}, {'$or': [{'.id': 'NCIT:C16576'}]}, {'$or': [{'.id': 'NCIT:C16576'}]}], '$or': [{'id': 'B00GWDY'}, {'id': 'B00GWDZ'}, {'id': 'B00GWE0'}, {'id': 'B00GWE1'}, {'id': 'B00GWE2'}, {'id': 'i45716'}, {'id': 'i42629'}, {'id': 'i46385'}, {'id': 'i46527'}, {'id': 'i46727'}]}
beacon | [beacon.db.utils][ INFO ] (L46) <pymongo.cursor.Cursor object at 0x710977f524d0>
beacon | [beacon.db.utils][ INFO ] (L47) {'$and': [{'$and': []}, {'$or': [{'.id': 'NCIT:C16576'}]}, {'$or': [{'.id': 'NCIT:C16576'}]}], '$or': [{'id': 'B00GWDY'}, {'id': 'B00GWDZ'}, {'id': 'B00GWE0'}, {'id': 'B00GWE1'}, {'id': 'B00GWE2'}, {'id': 'i45716'}, {'id': 'i42629'}, {'id': 'i46385'}, {'id': 'i46527'}, {'id': 'i46727'}]}
beacon | [beacon.db.utils][ INFO ] (L48) Collection(Database(MongoClient(host=['mongo:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin'), 'beacon'), 'individuals')
beacon | [beacon.db.utils][DEBUG ] (L218) 15
beacon | [beacon.db.utils][DEBUG ] (L219) 10
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {'$and': [{'$and': []}, {'$or': [{'.id': 'NCIT:C16576'}]}, {'$or': [{'.id': 'NCIT:C16576'}]}], '$or': [{'id': 'B00GWDY'}, {'id': 'B00GWDZ'}, {'id': 'B00GWE0'}, {'id': 'B00GWE1'}, {'id': 'B00GWE2'}, {'id': 'i45716'}, {'id': 'i42629'}, {'id': 'i46385'}, {'id': 'i46527'}, {'id': 'i46727'}]}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.request.handlers][DEBUG ] (L165) 15
beacon | [beacon.request.handlers][DEBUG ] (L169) record
beacon | [beacon.response.build_response][DEBUG ] (L68) 15
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {'id': 'ICAN_DATASET_3K'}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.db.utils][DEBUG ] (L83) FINAL QUERY: {}
beacon | [beacon.db.utils][DEBUG ] (L84) 0
beacon | [beacon.utils.stream][DEBUG ] (L25) HTTP response stream
beacon | [beacon.utils.stream][DEBUG ] (L30) Partial content: False
beacon | [aiohttp.access][ INFO ] (L206) 192.168.16.1 [11/Apr/2024:08:35:38 +0000] "GET /api/individuals?filters=NCIT:C16576 HTTP/1.1" 200 1219 "-" "HTTPie/2.4.0"
Thanks! Alex
Hi @albodrug, when beacon can't insert a count into mongo counts collection, now it returns 15. I have to change that. But beacon should return a correct count if you have executed all the correct steps in deployment. Please, check if you have a counts collection created for your mongo. You can use mongoexpress if you wish, that will be displayed at http://localhost:8081. If you have a counts collection created, then, please, make sure you executed the script:
docker exec beacon python beacon/reindex.py
Execute it again and try the query again the same way I pasted in the comment answering to AlexCork1, please. Tell me if this solved your issue. Thank you, Oriol
Hi @costero-e
When using the curl command you pasted, I get no results. Re-running the reindex.py empties my count table in mongo and does not solve the issue.
(labaz) bodrug-a@pp-irs1-ylt:~$ sudo docker exec beacon python beacon/reindex.py
(labaz) bodrug-a@pp-irs1-ylt:~$ curl -H 'Content-Type: application/json' -X POST -d '{
"meta": {
"apiVersion": "2.0"
},
"query":{ "requestParameters": {
},
"filters": [
{"id":"NCIT:C16576", "scope":"individual"} ],
"includeResultsetResponses": "HIT",
"pagination": {
"skip": 0,
"limit": 10
},
"testMode": false,
"requestedGranularity": "record"
}
}' http://localhost:5050/api/individuals | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1064 0 697 100 367 63363 33363 --:--:-- --:--:-- --:--:-- 96727
{
"meta": {
"beaconId": "org.ega-archive.ga4gh-approval-beacon-test",
"apiVersion": "v2.0.0",
"returnedGranularity": "record",
"receivedRequestSummary": {
"apiVersion": "2.0",
"requestedSchemas": [],
"filters": [
"NCIT:C16576",
"individual"
],
"requestParameters": {},
"includeResultsetResponses": "HIT",
"pagination": {
"skip": 0,
"limit": 10
},
"requestedGranularity": "record",
"testMode": false
},
"returnedSchemas": [
{
"entityType": "individual",
"schema": "beacon-individual-v2.0.0"
}
]
},
"responseSummary": {
"exists": false
},
"response": {
"resultSets": []
},
"beaconHandovers": [
[
{
"handoverType": {
"id": "CUSTOM:000001",
"label": "Project description"
},
"note": "Project description",
"url": "https://www.nist.gov/programs-projects/genome-bottle"
}
]
]
}
The get command returns 15.
(labaz) bodrug-a@pp-irs1-ylt:~$ http GET http://localhost:5050/api/individuals?filters=NCIT:C16576
HTTP/1.1 200 OK
Content-Type: application/json;charset=utf-8
Date: Thu, 11 Apr 2024 10:04:36 GMT
Server: GA4GH Approval Beacon Test v2.0 (based on Python/3.10 aiohttp/3.8.1)
Transfer-Encoding: chunked
{
"beaconHandovers": [
[
{
"handoverType": {
"id": "CUSTOM:000001",
"label": "Project description"
},
"note": "Project description",
"url": "https://www.nist.gov/programs-projects/genome-bottle"
}
]
],
"meta": {
"apiVersion": "v2.0.0",
"beaconId": "org.ega-archive.ga4gh-approval-beacon-test",
"receivedRequestSummary": {
"apiVersion": "v2.0.0",
"filters": [
"NCIT:C16576"
],
"includeResultsetResponses": "HIT",
"pagination": {
"limit": 10,
"skip": 0
},
"requestParameters": {
"filters": "NCIT:C16576"
},
"requestedGranularity": "record",
"requestedSchemas": [],
"testMode": false
},
"returnedGranularity": "record",
"returnedSchemas": [
{
"entityType": "individual",
"schema": "beacon-individual-v2.0.0"
}
]
},
"response": {
"resultSets": [
{
"exists": true,
"id": "ICAN_DATASET_3K",
"results": [],
"resultsCount": 15,
"resultsHandover": [
{
"handoverType": {
"id": "CUSTOM:000001",
"label": "Project description"
},
"note": "Project description",
"url": "https://www.nist.gov/programs-projects/genome-bottle"
}
],
"setType": "dataset"
}
]
},
"responseSummary": {
"exists": true,
"numTotalResults": 15
}
}
Thanks, Alex
Hi @albodrug, thanks for your reply. First of all, I see that GET requests with filters are not working properly with the last version I made, I will fix that, sorry. On the other hand, POST requests do work properly but what I see from the POST request you made is that you get no results. That may be caused because of not having data available with the filtering term you are applying, or not having datasets in public_datasets.yml or not having the relationship between the ids and their datasets. Just for you to believe me, here I show you an example I just made with the query and the response I get with the CINECA synthetic dataset:
curl \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"meta": {
"apiVersion": "2.0"
},
"query":{ "requestParameters": {
},
"filters": [
{"id":"NCIT:C16576", "scope":"individual"} ],
"includeResultsetResponses": "HIT",
"pagination": {
"skip": 0,
"limit": 1
},
"testMode": false,
"requestedGranularity": "record"
}
}' \
http://localhost:5050/api/individuals
{"meta":{"beaconId":"org.ega-archive.ga4gh-approval-beacon-test","apiVersion":"v2.0.0","returnedGranularity":"record","receivedRequestSummary":{"apiVersion":"2.0","requestedSchemas":[],"filters":["NCIT:C16576","individual"],"requestParameters":{},"includeResultsetResponses":"HIT","pagination":{"skip":0,"limit":1},"requestedGranularity":"record","testMode":false},"returnedSchemas":[{"entityType":"individual","schema":"beacon-individual-v2.0.0"}]},"responseSummary":{"exists":true,"numTotalResults":1271},"response":{"resultSets":[{"id":"CINECA_synthetic_cohort_EUROPE_UK1","setType":"dataset","exists":true,"resultsCount":1271,"results":[{"_id":"6616894514c916aeda0fa156","ethnicity":{"id":"NCIT:C67109","label":"White and Asian"},"id":"HG00100","interventionsOrProcedures":[{"procedureCode":{"id":"OPCS4:T77.2","label":"OPCS(v4-0.0):Wide excision of muscle"}}],"measures":[{"assayCode":{"id":"LOINC:35925-4","label":"BMI"},"date":"2021-09-24","measurementValue":{"unit":{"id":"NCIT:C49671","label":"Kilogram per Square Meter"},"value":28.27885509}},{"assayCode":{"id":"LOINC:3141-9","label":"Weight"},"date":"2021-09-24","measurementValue":{"unit":{"id":"NCIT:C28252","label":"Kilogram"},"value":74.4885}},{"assayCode":{"id":"LOINC:8308-9","label":"Height-standing"},"date":"2021-09-24","measurementValue":{"unit":{"id":"NCIT:C49668","label":"Centimeter"},"value":162.2982}}],"sex":{"id":"NCIT:C16576","label":"female"}}],"resultsHandover":{"handoverType":{"id":"CUSTOM:000001","label":"Project description"},"note":"Project description","url":"https://www.nist.gov/programs-projects/genome-bottle"}}]},"beaconHandovers":[{"handoverType":{"id":"NCIT:C189151","label":"Study Data Repository"},"note":"Colorectal Adenocarcinoma TCGA PanCancer data. The original data is <a href=\"https://gdc.cancer.gov/about-data/publications/pancanatlas\">here</a>. The publications are <a href=\"https://www.cell.com/pb-assets/consortium/pancanceratlas/pancani3/index.html\">here</a>.","url":"https://github.com/cBioPortal/datahub/tree/master/public/coadread_tcga_pan_can_atlas_2018"},{"handoverType":{"id":"CUSTOM:000001","label":"Project description"},"note":"Project description","url":"https://www.nist.gov/programs-projects/genome-bottle"}]}
Another reason of this may be that you don't have the filtering terms script executed. Please, make sure you have everything working as the deployment instructions and try back. Anyway, I will now fix the GET requests with filters so you can try a get. I tell you when I have fixed it.
Thanks, Oriol
Hi @albodrug, get requests with filters (just to individuals for now) should be fixed now. Please, if you can try and tell me what's your outcome now, I will appreciate. Thank you, Oriol
The get requests work after a git pull on the cineca data.
(labaz) bodrug-a@pp-irs1-ylt:~$ http GET http://localhost:5050/api/individuals?filters=NCIT:C16576 | python -m json.tool | grep numTotalResults
"numTotalResults": 1271
I still have issues with my own data, I do execute the filtering and index scripts after data loading though... I have a bash script to load the data that finishes with:
sudo docker exec beacon python beacon/reindex.py
sudo docker exec beacon python beacon/db/extract_filtering_terms.py
I will check the configs and ymls more thoroughly as suggested.
Thanks a lot for your help and patience.
Alex
Hi @albodrug, no problem. Thank you for reporting issues and testing beacon RI. The script looks to be doing what is needed, I think the issue may be coming from the .yml files. Please, introduce all the ids (biosample and individuals) in the dataset entry of the datasets.yml file, with the exact names (for the dataset and the ids) and be aware of case sensitivity. If you want to paste here what you have in your .yml files maybe I can help. After modifying the .yml files, try to build the beacon container again (to discard is not a problem of the container not being refreshed). Also, bear in mind that you need a datasets.json that has a document with an id that is this very same name that you write for the dataset in the .yml files. I'm here to help with beacon RI so no worries, keep asking whatever issue you have.
Best, Oriol
@costero-e , my issue was due to a badly formatted datasets.json file. thanks for all the tips.
looking forward to the fix on biosamples count as well.
Bye, Alex
Biosamples filters for get requests are working. The issue was only for individuals and g_variants.
Best, Oriol
Just a comment from my side. POST requests worked like charm and I am getting 1271 as numTotalResults.
Unfortunately GET request on my side is still now working correctly (I cloned repository yesterday evening). The response I get is "numTotalResults": 15, "results":[].
Here is shortened response:
{ "meta": {... "receivedRequestSummary": { "apiVersion": "v2.0.0", "requestedSchemas": [], "filters": [ "NCIT:C16576" ], "requestParameters": { "filters": "NCIT:C16576" }, "includeResultsetResponses": "HIT", "pagination": { "skip": 0, "limit": 10 }, "requestedGranularity": "record", "testMode": false }, "returnedSchemas": [ { "entityType": "individual", "schema": "beacon-individual-v2.0.0" } ] }, "responseSummary": { "exists": true, "numTotalResults": 15 }, "response": { "resultSets": [ { "id": "CINECA_synthetic_cohort_EUROPE_UK1", "setType": "dataset", "exists": true, "resultsCount": 15, "results": [], "resultsHandover": { "handoverType": { "id": "CUSTOM:000001", "label": "Project description" }, "note": "Project description", "url": "https://www.nist.gov/programs-projects/genome-bottle" } } ] },... }
Hi @AlexCork1 ! This is the issue that happened before the patch I committed yesterday. Please, make sure to git fetch and then git pull and then build the beacon container again.
docker-compose up -d --build beacon
Thanks, Oriol
Thanks! It works now :)
I think there are bugs in querying individuals by sex. If I enter for example "NCIT:C16576" the result set contains male as well. Also female count is to low.
Steps to reproduce: