OpenRefine / SparqlExtension

Extension which lets you create an OpenRefine project from a SPARQL query
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Create project from WCQS SPARQL query (special case: requires Wikimedia authentication) #7

Open trnstlntk opened 2 years ago

trnstlntk commented 2 years ago

This is a feature request for a specific subcase of OpenRefine/OpenRefine#1212. This will be helpful for users who want to edit structured data of existing Wikimedia Commons files with the help of OpenRefine.

Through the SDC project for OpenRefine, users will be able to edit and upload files with structured data on Wikimedia Commons. See more info about this project on meta.wikimedia.org.

In some cases, it may be very handy for users to start an OpenRefine project with a SPARQL query from the Wikimedia Commons Query Service (WCQS). However, this specific SPARQL endpoint requires Wikimedia OAuth authentication.

It would be great if the work done on OpenRefine/OpenRefine#1212 also includes this use case, or alternatively we add support for WCQS after that general task has been completed.

Proposed solution

I have no idea at all about the technical difficulties re: this request. Curious to hear considerations around this!

Additional context

trnstlntk commented 2 years ago

@antoine2711 tagging you here, since it touches upon the Outreachy project you will hopefully be mentoring soon :-)

lozanaross commented 2 years ago

One thing worth adding here is that in the SDC survey (https://commons.wikimedia.org/wiki/File:Analysis_of_the_first_OpenRefine_SDC_open_survey.pdf) users expressed interest in being able to query WCQS, as well as WDQS, directly via OpenRefine as one route towards project creation. The WDQS case is also relevant to SDC because many commons files are linked also in WD via image P18, so in theory if we can support at least WDQS queries (perhaps via the Outreachy project) that will already meet some of the user needs, if not all of course.

antoine2711 commented 2 years ago

(…) being able to query WCQS, as well as WDQS, directly via OpenRefine as one route towards project creation.

@lozanaross: the way I see it, this SHOULD be done. And let's say I'm in a good position to promote it. ;-) That being, I will also try to do federated queries, that is, query 2 or more SPARQL end-point at the same time. This is probably more complex than choosing a different end-point, but still, technologically, it should work.

Let's see our far we can go.

Regards, Antoine

lozanaross commented 2 years ago

I will also try to do federated queries, that is, query 2 or more SPARQL end-point at the same time

@antoine2711: that sounds goods really good, if any UI help is needed, I'm happy to advise.

thadguidry commented 2 years ago

@lozanaross @trnstlntk Isn't this issue out of scope for the SDC grant? Looks like it's tagged against the Project via GitHub and maybe should not? I might be wrong and it's in scope? :-)

antoine2711 commented 2 years ago

@lozanaross @trnstlntk Isn't this issue out of scope for the SDC grant? Looks like it's tagged against the Project via GitHub and maybe should not? I might be wrong and it's in scope? :-)

@thadguidry: I think we can say that it has many scopes. I do believe it can be achieve thru the Outreachy project « Implement a SPARQL Importer » which is very vague and which could fully and legitimately be a generic SPARQL end-point, or the specific WDQS and WCQS.

It does have the https://github.com/OpenRefine/OpenRefine/labels/gsoc%2Foutreachy … ;-) And thru the logic of being core to modifying SDC, it can only also be in it, in my opinion.

Regards, Antoine

trnstlntk commented 2 years ago

@lozanaross @trnstlntk Isn't this issue out of scope for the SDC grant? Looks like it's tagged against the Project via GitHub and maybe should not? I might be wrong and it's in scope? :-)

This is totally in scope. See Lozana's comment above. Starting a project from a Wikimedia Commons SPARQL query is going to be very helpful for Wikimedia Commons users (batch SDC editors) in OpenRefine.

We have not promised this feature as part of the current Wikimedia Foundation grant, but I do want to investigate if we can implement it and I want keep this issue on our (Wikimedia Commons focused) radar generally.

trnstlntk commented 2 years ago

That being, I will also try to do federated queries, that is, query 2 or more SPARQL end-point at the same time. This is probably more complex than choosing a different end-point, but still, technologically, it should work.

Pertaining this specific task (the Wikimedia Commons SPARQL endpoint): I am hearing that federated querying (e.g. involving both WDQS and WCQS) is not very obvious there, because of the authentication at WCQS.

You would make me very happy (and help the Wikimedia ecosystem) if we can at least research from our side if (federated) querying with WCQS is possible for project creation in OpenRefine. If it is very hard or impossible to do for us, then I can take this as an additional argument to Wikimedia Foundation search/query teams that more investment in Commons' SPARQL endpoint is needed, so that the authentication layer there can be removed.

antoine2711 commented 2 years ago

Pertaining this specific task (the Wikimedia Commons SPARQL endpoint): I am hearing that federated querying (e.g. involving both WDQS and WCQS) is not very obvious there, because of the authentication at WCQS.

You would make me very happy (and help the Wikimedia ecosystem) if we can at least research from our side if (federated) querying with WCQS is possible for project creation in OpenRefine.

Well, @trnstlntk, for WDQS, I did a federated query with another end-point, and I think that I also did it from an external query service that used WDQS in a federated query.

Now, if there is authentification with the WCQS, it, in itself, will be a challenge. But once it's setteld, I don't see why FROM WCQS, we couldn't do a federated query. But it's probably very hard to do it from another query service and use WC end-point as a federated query.

Regards, Antoine

lozanaross commented 2 years ago

for WDQS, I did a federated query with another end-point, and I think that I also did it from an external query service that used WDQS in a federated query.

@antoine2711 @trnstlntk from my point of view federated queries would be super useful even outside SDC scope (ie outside WCQS) and just in general for the Wikdiata/Wikibase extension purposes. With my NFDI hat on, I would find querying e.g. my own Wikibase + Wikdiata pretty useful. WCQS would be added bonus, but only if the balance between added value vs extra dev effort is worth it in the end.

trnstlntk commented 2 years ago

We discussed that it may be useful if I'd collect some typical example queries. Here are a few that will be useful for people who want to batch edit SDC with OpenRefine, and who would like to start from a SPARQL query: