HermannKroll / NarrativeIntelligence

GNU General Public License v3.0
4 stars 0 forks source link

Narrative Service: Add vocabulary for cell lines #288

Open HermannKroll opened 4 months ago

HermannKroll commented 4 months ago

Find the vocubalary that the NLM uses for PubTator. Use that vocabulary to translate CellLines in our services and to make CellLines searchable.

Maybe we also need to integrate a way to annotate CellLines by our own. PubTator uses TaggerOne (maybe there is a new version).

ir0ntr0nik commented 2 months ago

PubTator3 uses Cellosaurus as terminology for Cell Line annotations. The respective vocabulary can be found at https://ftp.expasy.org/databases/cellosaurus/cellosaurus.xml.

  1. The size of the XML file is ~500MB
  2. The size of the preprocessed vocabulary is ~6.5MB
  3. The vocabulary contains roughly 150k entities

The implementation is ready.