Closed mnarayan1 closed 11 months ago
@mnarayan1 on quick glance, your TRAPI query and your smartAPI annotation look good to me. When you say "My local installation of BTE is working fine" I assume you've gotten local overrides working on your local instance? And do you see zero results for other gene identifiers (e.g., HGNC, wormbase, xenbase, etc.)?
@andrewsu The other gene identifiers are not working either. I have local overrides working on my local instance, and BTE was able to successfully load AGR into smartapi_specs.
@mnarayan1
Sorry for such a belated response. Are you still available to work on this issue? If not, it's not a problem - I'll merge the PR which will preserve the record of work you've done, then add commits...
I've found the reasons why the x-bte annotation wasn't working, and I have a list of proposed fixes (the minimum needed to get the annotation working)
This is necessary for current x-bte annotation because the different data subsets represent different relationships that we can assign different biolink predicates to. Also, the different ID-namespaces need to be handled differently (see next points)... Notes: * that could mean a combinatorial explosion of operations >.<. We can cut down by only writing operations if they cover > 5 records/documents. * there's 4 data subsets that we could annotate (not negation: `agr.biomarker_via_orthology`, `agr.implicated_via_orthology`, `agr.is_implicated_in`, `agr.is_marker_for`) * multiple gene ID-namespaces involved (MGI, RGD, SGD, etc). Madhumita has already listed them in yaml comments
BTE doesn't always automatically add prefixes to IDs when generating the queries. It looks like for this API, all the IDs (field `_id`) have prefixes that need adding (gene namespaces and DOID) Example: ``` requestBody: body: ## API data has prefix ## joinSafe is only needed if the delimiter isn't a comma q: "{{ queryInputs | replPrefix('MGI') }}" scopes: _id ```
* fields (besides _id) are missing the root field: they should start with `agr.` * We can add the `agr.symbol` for each operation. This may be useful since Translator's [NodeNorm](https://smart-api.info/ui/400f7c11028ff36f460af4ea85dc72f5) may not support every namespace (could check [here](https://github.com/biothings/biothings_explorer/issues/735#issuecomment-1751919208) or put IDs into the endpoint)
Right now, it doesn't work because: (1) many references are to `x-bte-response-mapping/gene` but that doesn't exist (the two objects in response-mapping are `drug` and `disease`), and (2) the `drug` object includes multiple output fields which currently isn't supported in x-bte annotation/BTE... To fix: * 1 response-mapping object per output field (so `agr.biomarker_via_orthology.doid` and `agr.implicated_via_orthology.doid` would be in separate objects) * and 1 response-mapping object per ID-namespace (so `RGD: _id` and `MGI: _id` would be in separate objects) * make sure the response_mapping ref for each operation points to an existing object in the `x-bte-response-mapping` section
And a note (mostly to my future self), here's the other stuff I noticed. It's not essential now, but will be for getting the AGR SmartAPI yaml fully ready
- `version`: I'm not sure if this is valid. The metadata endpoint seems to show that the data download is 2021? - `info.x-translator.infores`: this needs to be a separate new one for this api, and registered in the infores registry - `info.x-tranlsator.biolink-version`: this can be updated to 3.5.3 - `servers.url`: Production server url should (?) be changed to http (right now it's https which makes it the same as encrypted one) - For operations, we could likely add qualifier for species o_0 since each namespace is species-specific! That's cool! - For the operation's `source`: does `infores:agrkb` exist in registry? Or is it AGR?
After discussion with Andrew, we've decided to merge this PR and I'll proceed with updating the yaml to complete https://github.com/biothings/biothings_explorer/issues/260
AGR API yaml file, for gene-disease relationships. Addresses this issue.
Notes:
Problems: Using this API record, I'm assuming that querying the gene
FB:FBgn0038376
should return the diseaseDOID:9970
(dyschromatosis universalis hereditaria). This is the query I ran:However, BTE is retrieving 0 successful results. My local installation of BTE is working fine, so I'm assuming that something is wrong with the annotations themselves. How can I fix this?