codeforkjeff / conciliator

OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.
GNU General Public License v3.0
112 stars 22 forks source link
entity-resolution openlibrary openrefine orcid reconciliation-service solr viaf

conciliator

conciliator is a growing collection of OpenRefine reconciliation services, as well as a Java framework for creating them. A reconciliation service tries to match variant text (usually names of things) to standard IDs for the entity represented by that text.

This project supercedes refine_viaf.

Table of Contents

Public Server

If your needs are low and you can't or don't want to run this software yourself, you can use the public server at http://refine.codefork.com/. Visit that address for more instructions.

General Features

Data Source Features

VIAF

ORCID

Open Library

Solr

Running Conciliator on Your Own Computer

Using Docker is the easiest and preferred way to build and run the application:

docker build -t conciliator .
./run_docker.sh

An alternative way to run conciliator using docker is available here.

If you don't have Docker, you can run the application as follows:

Install Java 11 if you don't already have it.

Download the .jar file for the latest release. Alternatively, you can download the source code tarball or clone this repository, and build the .jar file using maven.

Run this command:

# replace VERSION with the release you downloaded
java -jar conciliator-VERSION.jar

That's it! You should see some messages as the application starts up. Now you're ready to configure OpenRefine to use the service. When you're done with it, hit Ctrl-C to quit the application.

If a file named conciliator.properties exists in the current directory, conciliator will use the options found in it. See the sample file in this repository.

By default, conciliator will run on port 8080, which is used in the example URLs below. To use a different port, set the server.port property as follows when running the program:

java -Dserver.port=7000 -jar conciliator-VERSION.jar

Configuring OpenRefine

  1. In OpenRefine, chose a column of names you want to reconcile, and select "Reconcile" and "Start Reconciling..." in the column pull-down menu.

  2. Click "Add Standard Service..."

  3. Enter a URL based on the data source you wish to use.

    To reconcile against names from any VIAF source, type in:

    http://localhost:8080/reconcile/viaf

    To reconcile against a specific VIAF source, append its code to the end of the path. For example, to search only names from the Bibliothèque nationale de France, type in:

    http://localhost:8080/reconcile/viaf/BNF

    To retrieve the IDs used by source institutions, rather than VIAF IDs, use "proxy mode." For example, to search only names from the Library of Congress and retrieve their IDs, type in:

    http://localhost:8080/reconcile/viafproxy/LC

    To use ORCID:

    http://localhost:8080/reconcile/orcid

    To use ORCID with "smartnames" mode when reconciliing names:

    http://localhost:8080/reconcile/orcid/smartnames

    To use Open Library: (On the reconciliation screen, under the "Also use relevant details from other columns" panel, you can check the "Include?" box for columns to include in the query. Give them any name in the "As Property" box. If no results are found with these column values added to the query, the service will try again with only the original selected column.)

    http://localhost:8080/reconcile/openlibrary
  4. Follow the instructions on the dialog box to start reconciling names.

Creating Your Own Data Source

  1. Clone this repository to get the source code. The code you create in the next steps should live under a new com.codefork.refine.NEW_SOURCE package so that Spring's auto-scanning picks it up.

  2. Create a class for your data source that extends DataSource for very bare-bones functionality, or WebServiceDataSource if you are making requests to another web service. See the other data sources for some template code. Implement the abstract methods as required.

  3. Create a controller that autowires your new DataSource and hooks up a unique path, e.g. /reconcile/new_source. See VIAFController for an example.

  4. Write a test or two if you like.

  5. Set some default properties in Config if your data source has any settings you want to be configurable.

  6. Build a new .jar by running mvn clean package. Run the .jar file as in the instructions above, and you should be able to access the service for your new data source at:

    http://localhost:8080/reconcile/new_source

Advanced Usage

To build from the source code, install maven and type:

mvn package

If you want to host this software on a server for long-term usage or if you want to enable logging for debugging purposes, take a look at run.sh for some helpful options.

You can change run-time options by editing the conciliator.properties file.

To see usage/error statistics for the service, go to http://localhost:8080/stats

TODO

Resources

Specification for the Reconciliation Service API:

https://reconciliation-api.github.io/specs/latest/

This code drew inspiration from these other projects:

Do you use this thing??

Apparently, you do. Here's a bibliography of things that reference conciliator:

https://github.com/codeforkjeff/conciliator/wiki

If you use conciliator, please take a few seconds to leave a comment on this page. Hearing from users really motivates me to continue improving this project.

License

This code is distributed under a GNU General Public License. See the file LICENSE for details.