code-openness / Data

The pre-processing and formatting of the data to setup the Wikidata instance
0 stars 0 forks source link

check OpenRefine and the possibility to use it for our data #31

Closed AbdBarho closed 5 years ago

AbdBarho commented 5 years ago

OpenRefine.org

kozae commented 5 years ago

Open Refine has built in features for detecting similar entities that are written in different ways. Different implementation of "Edit Distance" are present. There are also some extensions for interaction with Wikidata, but I am not sure of their potential, colleagues who has more contact with Wikidata should look into that. Otherwise, Open Refine has also the possibility to write python code to perform filtering on the column, but basically it is the same things we were able to do with Pandas.