freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

index the GRID dataset #139

Closed m1ci closed 8 years ago

m1ci commented 8 years ago

Upload this dataset into FREME NER, download link: https://www.dropbox.com/sh/7kg8527zev9gsh2/AACbVTDgPwT-YAZEuQlKi8MJa?dl=0

sandroacoelho commented 8 years ago

Hi @m1ci

I have completed this task. Could you please test it?

Best,


I ran the following commands and I have a question: Is there some functionality that I can load all labels from Virtuoso to Solr directly?

1) Created folder to download original datasets

mkdir /tmp/grid

cd /tmp/grid

wget https://www.dropbox.com/sh/7kg8527zev9gsh2/AAAn-zH_dhmKduy_8HFnAPz9a/grid.nt?dl=0# -O grid.nt

wget https://www.dropbox.com/sh/7kg8527zev9gsh2/AABfjwxBcmg37VkXEYQ8T2lKa/grid.ttl?dl=0# -O grid.tll

2) Then I uploaded the files to Virtuoso

/usr/local/virtuoso-opensource/bin/isql

ld_dir ('/tmp/grid', '*.*', 'http://www.freme-project.eu/datasets/grid');
rdf_loader_run();

3) And downloaded labels as TSV, converting as a TTL

curl -g -H 'Accept: text/tab-separated-values' 'http://rv2622.1blu.de:8890/sparql?default-graph-uri=http%3A%2F%2Fwww.freme-project.eu%2Fdatasets%2Fgrid&query=SELECT+%3Fs+%3Fp+%3Fo++WHERE+{+%3Fs+%3Fp+%3Fo+.FILTER+regex%28str%28%3Fs%29%2C+%22http%22%29+.}' > grid.tsv

cat grid.tsv |  awk -F'\t' '{gsub(/"/, "", $1); gsub(/"/, "", $2) ; print "<"$1"> <"$2"> "$3 "@xx ."}' > grid_freme.ttl

4) Finally, grid_freme.ttl was uploaded to Solr

m1ci commented 8 years ago

Thanks, works http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&input=I%20work%20for%20the%20Fordham%20University.&outformat=turtle&language=en&dataset=grid

"Fordham University" is spotted and linked.

I have a question regarding your procedure for uploading the dataset into Solr. Why you not loading grid.nt directly, but you create dump from virtuoso as TSV and then submitting it to Solr.

Also, grid.nt and grid.ttl contain the same data but in different format. You can use either of the files.

sandroacoelho commented 8 years ago

I have a question regarding your procedure for uploading the dataset into Solr. Why you not loading grid.nt directly, but you create dump from virtuoso as TSV and then submitting it to Solr.

I confess that I was not smart to check the documentation or the code to know what is the easy way to load the data.