Closed JohnGiorgi closed 5 years ago
TODO
namespace
field.Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
saber/utils/app_utils.py | 0 | 1 | 0.0% | ||
saber/saber.py | 7 | 12 | 58.33% | ||
<!-- | Total: | 90 | 96 | 93.75% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
saber/saber.py | 3 | 71.22% | ||
<!-- | Total: | 3 | --> |
Totals | |
---|---|
Change from base Build 299: | 0.0% |
Covered Lines: | 1799 |
Relevant Lines: | 2225 |
This pull request implements grounding/entity linking for the major entity classes (Chemicals/Drugs, Disease/Disorder, Species/Living beings, and Proteins/Genes) using the EXTRACT 2.0 API. This is used in place of the grounding system we had previously, which only worked for protein/gene entities.
I tried to model the output format used by REACH as closely as possible. Grounding adds a new field to each item in
ents
(xrefs
) in the output JSON returned bySaber.annotate()
. E.g.,Without grounding
With grounding
Where
namespace
is the external resource the entity is grounded to andid
is the unique identifier in that external resource.organism-id
is unique toPRGE
entities. Currently, this will default to reporting9606
. The EXTRACT API docs state that a feature is currently being developed that will allow for the automatic detection of organism-id for each protein/gene mention. When / if that gets implemented, I will work it into Saber.Usage
Usage is the same as before. Grounding is off by default as it makes annotation slightly slower
See the docs for more info.
Issues Closed
Closes #23.