Open jvwong opened 6 years ago
Grounding via ChEBI is available in Reach. To meet previous contractual dictates, we had to stop using it but it is available if you compile your own version of the Bioresources project. Note: to do so, you will also need to compile the Processors library first as the code to modify Bioresources is contained in Processors
(you do not need to use this version of Processors
at runtime, you just need it to recompile Bioresources
.
To compile your own version of Bioresources:
1) edit the ner_kb.config
file:
a) find the chebi
entry, swap it with the PubChem
entry, uncomment chebi
and comment-out PubChem
.
b) similarly, move and uncomment hmdb
if you would like to supplement chebi
lookup.
2) Next, run the script to regenerate the lexicon files:
a) Insure that you have a compiled version of Processors
available at the same level as the Bioresources
project (i.e. "sister" directories), as the script is hardwired to use this structure (our apologies for that but the script was really created for our private, internal, infrequent use).
b) From Bioresources, run ner_kb.sh
. The script takes a few minutes to regenerate the files but should not generate any errors.
3) Compile the Bioresources JAR file, in this example to your local repository:
a) sbt clean publishLocal
I am attaching an example ner_kb.config
file, modified to use ChEBI
(note that I've also enabled HMDB
as a supplementary KB. If this is not desired, just comment the hmdb
line out).
Because of annoying GitHub limitations there is an extra .txt
extension on this file name:
ner_kb.config.txt
It's great to know that you support Chebi, but regardless of the underlying system, do you support returning InchiKeys?
Sorry, no.
Background: We've recently been in contact regarding getting a local instance of REACH up and running to process full-text articles for our project Factoid (see #551). We are using REACH grounding information for simple chemicals. Currently, information from PubChem (#167) is returned, but we are interested in retrieving records from other small-molecule databases, namely ChEBI.
Issue: Is REACH able to expose the International Chemical Identifier hash (InChIKey) for simple-chemicals so that each grounded entity can be unambiguously/directly looked up 'elsewhere'?