Closed jnehring closed 8 years ago
Hi @jnehring ,
For the data, we have two options:
What about to delivery all FREME solution through docker composer (Broker, Link , Publish, NER,Solr etc) ?
I like the idea to provide FREME as docker
Prepare a small script that reads our index and generate a dump to be imported or Turn our index available to download in binary format ( Solr/Lucene format)
Isnt it better to have a script-per-dataset which loads the data into Solr via FREME NER?
Hi @m1ci , @jnehring
We can use a simple shell script to do this dirty job for us. Is also possible to split the dump by dataset as Milan suggested
Please take a look on this snippet
thanks @sandroacoelho for this. Sharing dumps is fine, however, the source dataset can evolve over the time and we might want to integrate it from scratch. My idea was to have script-per-dataset such as https://github.com/freme-project/freme-ner/blob/master/index-loc-authors.py which loads RDF into Solr via FREME NER. The execution of all these scripts can be triggered by one single bash script.
In last developers call we agreed on creating a shell script for each dataset.
Hi @jnehring, @m1ci
We have two shell scripts to do this job for us, as follow:
I can say that freme-ner-dump-to-dataset.sh is in Beta version. I have been facing some problems to deal with single quotes, quotes and some others special characters in label fields that I hope to solve ASAP.
Our first extraction are available here.
Best,
Thank you! I think the script reads the datasets that are currently in freme-ner and then creates dumps of these scripts. Am I right?
One more question: Why did you exclude DBPedia?
I think the next steps are
Hi @jnehring
Thank you! I think the script reads the datasets that are currently in freme-ner and then creates dumps of these scripts. Am I right?
Yes.
One more question: Why did you exclude DBPedia?
We can download DBpedia labels dataset from here, but if you think that is important to include this data in our dump, please let me know. I will just remove a parameter in the query
I think the next steps are
Upload these datasets to our FREME server (I will do this as soon as all datasets are ready) Create a script that uploads all datasets together, maybe using this script. Write an article about how to upload a datase. This article will cover also #125 later.
Great. Before you start, give me a chance to finish a docker for our Solr instance . There we have a tested solution for our issue #91
Best,
The datasets will be uploaded in #130.
One question: The datasets you created, are they any different from the datasets already available at http://api.freme-project.eu/doc/current/api-doc/list-datasets.html ?
what do you mean by different? Btw, there is also:
what do you mean by different?
I was afraid that the work done in this task was unnecessary. But now I see that there are some datasets that would not be available without this work.
In case we want to make the FREME NER datasets available to others, what do we need to do? With making them available I mean that other FREME users like the ADAPT research centre can easily upload our datasets into their FREME NER installation?
I guess we need to provide the SKOS files for these datasets. It would also be nice to have an installer script.