OpenRefine / SparqlExtension

Extension which lets you create an OpenRefine project from a SPARQL query
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Entity column should be reconciled #1

Open antoine2711 opened 2 years ago

antoine2711 commented 2 years ago

Description

If a column contains entities, they should be reconciled instead of showing the URL.

Example

item_original column is the URL, but the second column, item, is what should be expected. image

antoine2711 commented 2 years ago

This dialog should be displayed ONCE per import and should be used for all entities: image

WaltonG commented 1 year ago

@antoine2711 @wetneb I would appreiciate your views on the following logic I am intending to use to close this issue.

wetneb commented 1 year ago

I would do it differently: instead of triggering reconciliation after project creation, I would create the project with reconciled cells already. This would give you a result similar to the one you get with the "Use values as identifiers" operation (which does not contact the reconciliation service at all). It would therefore be much faster.

I think it would also be worth relying on the existing list of reconciliation services known to OpenRefine, so that users can choose a service based on this list, instead of having to type a reconciliation endpoint URL.

Bonus point: ideally, you should be able to suggest the right service for the right column, just by checking if the URLs in the column match the service's view template. For instance, if a column contains URLs of the form http://www.wikidata.org/entity/Q345 and a reconciliation service has a view template of http://www.wikidata.org/entity/{{id}}, then it is likely that it makes sense to reconcile this column with this service, and you can infer the reconciliation identifiers directly. This is perhaps not so easy to implement, but it could be very useful. Also, it is likely that if you have a column with URIs for some entities, you also have a column elsewhere for their names (labels) so it would be amazing to let the user pick that as names to be used in the reconciliation cells. Potentially, to avoid building the UI to specify pairs of columns with id/name, you could add some expectation about the naming of the variables (which is the case in Wikidata: ?item / ?itemLabel are frequently used for that).

Definitely not easy but as a user I can see a lot of potential for it!

WaltonG commented 1 year ago

I would do it differently: instead of triggering reconciliation after project creation, I would create the project with reconciled cells already

Nice, I have just understood the logic.

I think it would also be worth relying on the existing list of reconciliation services known to OpenRefine, so that users can choose a service based on this list, instead of having to type a reconciliation endpoint URL.

Sure

WaltonG commented 1 year ago

I have thought of two methods of having reconciled cells :

What are your views of the methods.

wetneb commented 1 year ago

I would reconcile the cells in the backend. This would not call the "recon-use-values-as-identifiers" command or operation, but rather create the Cell objects with the appropriate Recon fields directly during project creation, at the place where you convert SPARQL results into the grid.

WaltonG commented 1 year ago

create the Cell objects with the appropriate Recon fields

When creating a Recon object the value of historyEntryID is passed to the method as a parameter public Recon createNewRecon(long historyEntryID). Since the project is at the import stage with no history would it be ok to pass the default value 0 to the method ?

wetneb commented 1 year ago

I think so! You could also check what the WikitextImporter is doing (it is also creating reconciled cells at project creation time).

antoine2711 commented 1 year ago

@WaltonG : here's a little example of a 2 columns by 2 rows of a wikitext table… https://drive.google.com/file/d/1-btiFT2yjIZbS3A4AY0wCES3MgOdlzhr/view?usp=sharing

Regards, Antoine

WaltonG commented 1 year ago

@WaltonG : here's a little example of a 2 columns by 2 rows of a wikitext table… https://drive.google.com/file/d/1-btiFT2yjIZbS3A4AY0wCES3MgOdlzhr/view?usp=sharing

Regards, Antoine

@antoine2711 Thanks for the example, the wikitext importer actually creates a reconciled project