Open antoine2711 opened 2 years ago
This dialog should be displayed ONCE per import and should be used for all entities:
@antoine2711 @wetneb I would appreiciate your views on the following logic I am intending to use to close this issue.
I would do it differently: instead of triggering reconciliation after project creation, I would create the project with reconciled cells already. This would give you a result similar to the one you get with the "Use values as identifiers" operation (which does not contact the reconciliation service at all). It would therefore be much faster.
I think it would also be worth relying on the existing list of reconciliation services known to OpenRefine, so that users can choose a service based on this list, instead of having to type a reconciliation endpoint URL.
Bonus point: ideally, you should be able to suggest the right service for the right column, just by checking if the URLs in the column match the service's view template. For instance, if a column contains URLs of the form http://www.wikidata.org/entity/Q345
and a reconciliation service has a view template of http://www.wikidata.org/entity/{{id}}
, then it is likely that it makes sense to reconcile this column with this service, and you can infer the reconciliation identifiers directly. This is perhaps not so easy to implement, but it could be very useful.
Also, it is likely that if you have a column with URIs for some entities, you also have a column elsewhere for their names (labels) so it would be amazing to let the user pick that as names to be used in the reconciliation cells.
Potentially, to avoid building the UI to specify pairs of columns with id/name, you could add some expectation about the naming of the variables (which is the case in Wikidata: ?item
/ ?itemLabel
are frequently used for that).
Definitely not easy but as a user I can see a lot of potential for it!
I would do it differently: instead of triggering reconciliation after project creation, I would create the project with reconciled cells already
Nice, I have just understood the logic.
I think it would also be worth relying on the existing list of reconciliation services known to OpenRefine, so that users can choose a service based on this list, instead of having to type a reconciliation endpoint URL.
Sure
I have thought of two methods of having reconciled cells :
At the parse preview page I would reconcile the cells using OpenRefine's "use values as identifiers" reconciliation method then pass the reconciled values to the backend's create-project method via the options variable of form-data. This approach seems easy but still I will have to work on coming up with a DTO of the reconciled cells.
The other approach would be reconciling the cells at the backend. To use this approach I would need a bit of clarification on the logic behind "recon-use-values-as-identifiers" command. OpenRefine commands are invoked via API calls from the frontend, would I be on the right track when I call the "recon-use-values-as-identifiers" and "get-rows" commands from the backend to reconcile the cells and get the reconciled cells respectively?
What are your views of the methods.
I would reconcile the cells in the backend. This would not call the "recon-use-values-as-identifiers" command or operation, but rather create the Cell
objects with the appropriate Recon
fields directly during project creation, at the place where you convert SPARQL results into the grid.
create the
Cell
objects with the appropriateRecon
fields
When creating a Recon
object the value of historyEntryID
is passed to the method as a parameter public Recon createNewRecon(long historyEntryID)
. Since the project is at the import stage with no history would it be ok to pass the default value 0 to the method ?
I think so! You could also check what the WikitextImporter is doing (it is also creating reconciled cells at project creation time).
@WaltonG : here's a little example of a 2 columns by 2 rows of a wikitext table… https://drive.google.com/file/d/1-btiFT2yjIZbS3A4AY0wCES3MgOdlzhr/view?usp=sharing
Regards, Antoine
@WaltonG : here's a little example of a 2 columns by 2 rows of a wikitext table… https://drive.google.com/file/d/1-btiFT2yjIZbS3A4AY0wCES3MgOdlzhr/view?usp=sharing
Regards, Antoine
@antoine2711 Thanks for the example, the wikitext importer actually creates a reconciled project
Description
If a column contains entities, they should be reconciled instead of showing the URL.
Example
item_original column is the URL, but the second column, item, is what should be expected.![image](https://user-images.githubusercontent.com/12217525/173279754-ea5cf2bf-bf70-4305-9ad6-56773bd92591.png)