EarthNLP / ClimateScholar

ClimateScholar is a scientific discovery search engine & knowledge graph to help researchers in combatting the climate crisis.
The Unlicense
3 stars 0 forks source link

NER #13

Open Hevia opened 1 year ago

Hevia commented 1 year ago

https://github.com/facebookresearch/BLINK

Hevia commented 1 year ago

We can enrich the KG with additional knowledge from Wikipedia & Wikispecies. It would be helpful to find methods of automatically identifying these mentions in a text

If we stick down the current route of multiple entity extractors (see the RE issue #12) we also need a ground truth entity labeling method and ensure all entities resolve down to whatever the Wikipedia labeler we end up using is

Hevia commented 1 year ago

Tutorial: https://twitter.com/spacy_io/status/1603387549589901314

Hevia commented 1 year ago

Models to eval: https://huggingface.co/RJuro/SciNERTopic

Hevia commented 1 year ago

https://twitter.com/honnibal/status/929684075358670848?s=20&t=5FIVa11jl1Rz5H46nLqpsA and https://twitter.com/honnibal/status/1604933905877966875?s=20&t=5FIVa11jl1Rz5H46nLqpsA

Hevia commented 1 year ago

More ground truth sources:

davidberenstein1957 commented 1 year ago

I can recommend entityfishing or dbpedia spotlight.

davidberenstein1957 commented 1 year ago

dbpedia spotlight

This holds information on how to build this in a VM. https://github.com/dbpedia-spotlight/spotlight-docker

VM requirements

SSH

After deployment SSH into VM via GCP VM button.

Install docker

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-debian-10

$ sudo apt update
$ sudo apt install apt-transport-https ca-certificates curl gnupg2 $ software-properties-common
$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
$ sudo apt update
$ apt-cache policy docker-ce
$ sudo apt install docker-ce
$ sudo systemctl status docker

Run image

https://github.com/dbpedia-spotlight/spotlight-docker

$ docker run -tid --restart unless-stopped --name dbpedia-spotlight.en --mount source=spotlight-model,target=/opt/spotlight -p 2222:80 dbpedia/dbpedia-spotlight spotlight.sh en

entity-fishing

This holds information on how to build this in a VM. https://nerd.readthedocs.io/en/latest/build.html

VM requirements

SSH

After deployment SSH into VM via GCP VM button.

build grobid and grobid-ner

install Git and JDK https://www.digitalocean.com/community/tutorials/how-to-install-git-on-debian-10 https://linuxize.com/post/install-java-on-debian-10/

These packages are needed to build grobid

$ git clone https://github.com/kermitt2/grobid.git  --branch 0.7.1
$ cd grobid
$ ./gradlew clean install

$ git clone https://github.com/kermitt2/grobid-ner.git
$ cd grobid-ner
$ ./gradlew copyModels
$ ./gradlew clean install
$ cd ..
$ cd ..

export the wikipedia dump and run the app

install unzip and install entity fishing https://linuxize.com/post/how-to-unzip-files-in-linux/ www.compciv.org/recipes/cli/downloading-with-curl/ https://unix.stackexchange.com/questions/479/keep-processes-running-after-ssh-session-disconnects

$ git clone https://github.com/kermitt2/entity-fishing.git
$ cd entity-fishing
$ curl https://science-miner.s3.amazonaws.com/entity-fishing/0.0.5/db-kb.zip --output db-kb.zip
$ curl https://science-miner.s3.amazonaws.com/entity-fishing/0.0.5/db-en.zip --output db-en.zip
$ sudo apt install unzip
$ unzip db-kb.zip -d data/db/
$ unzip db-en.zip -d data/db/
$ ./gradlew clean build
$ nohup ./gradlew run