commons-app / apps-android-commons

The Wikimedia Commons Android app allows users to upload pictures from their Android phone/tablet to Wikimedia Commons
https://commons-app.github.io/
Apache License 2.0
1.03k stars 1.24k forks source link

More powerful pin filtering of Nearby map via allowing users to modify SPARQL query #3410

Closed misaochan closed 2 years ago

misaochan commented 4 years ago

As requested by @VojtechDostal , power users familiar with SPARQL would find it useful to be able to partially modify the SPARQL query themselves to filter results. We can probably implement this in a similar manner to https://tools.wmflabs.org/wikishootme#lat=50.0720077&lng=14.424068199999999&zoom=15

VojtechDostal commented 4 years ago

Adding a screenshot from WikiShootMe. Thank you Josephine for starting this issue :)

sparql filter
misaochan commented 3 years ago

My suggestions on how to implement this:

What do you think @neslihanturan @VojtechDostal @nicolas-raoul ?

nicolas-raoul commented 3 years ago

Sounds perfect! About the last step though: If a custom request fails or times out (50% of my SPARQL errors are mistyped variable names, which unfortunately result in hard-to-debug timeouts), I would suggest telling the user to fix or reset their query, so that they don't have to retype it again. I don't think we have a way to check the validity of a query without actually executing it.

neslihanturan commented 3 years ago

I love the use case @misaochan . Now I am more clear about this issue to create a UI. I agree @nicolas-raoul for the last step. Also we can have reset to default or save my query buttons for further usages maybe? So that they can bring their query back instead of typing all over again.

nicolas-raoul commented 3 years ago

Custom SPARQL is rather niche, so personally I don't really think a "save my query" feature would be worth the complexity&maintenance. People who write SPARQL queries should be able to save their queries in a memo app, a text file or even an email draft :-)

By the way, the advantage of letting users modify the whole query (rather than just a fraction like WikiShootMe) is that we can do much deeper customization, for instance here is how I would possibly use this feature:

  1. Exclude schools
  2. Search for labels in all of the languages I know, rather than just one language, before fallback
  3. Artificially increase radius in the areas where I have already taken all pictures
  4. Increase the meaning of what is considered as "destroyed"
  5. Sometimes misuse the "destroyed" feature to simply differentiate between two kinds of items. Example: Black for items that have an article only in the Cebuano Wikipedia
  6. Remove some stuff that I typically do not use, to make queries faster

This is all doable by editing the SPARQL query.

misaochan commented 3 years ago

I agree that we probably don't need a "save my query" function for the time being, and also the full query should be editable.

About the last step though: If a custom request fails or times out (50% of my SPARQL errors are mistyped variable names, which unfortunately result in hard-to-debug timeouts), I would suggest telling the user to fix or reset their query, so that they don't have to retype it again. I don't think we have a way to check the validity of a query without actually executing it.

Ah, when I said "invalid" I was actually thinking about the request failing - are they different? I agree with both of you, auto-resetting is probably a bad idea now that I think about it. :)

neslihanturan commented 3 years ago

Hi everyone, there are some points needs to be clarified appeared during my mockup attempts. Currently we use this query:

SELECT
     (SAMPLE(?location) as ?location)
     ?item
     (SAMPLE(COALESCE(?itemLabelPreferredLanguage, ?itemLabelAnyLanguage)) as ?label)
     (SAMPLE(COALESCE(?itemDescriptionPreferredLanguage, ?itemDescriptionAnyLanguage, "?")) as ?description)
     (SAMPLE(?classId) as ?class)
     (SAMPLE(COALESCE(?classLabelPreferredLanguage, ?classLabelAnyLanguage, "?")) as ?classLabel)
     (SAMPLE(COALESCE(?icon0, ?icon1)) as ?icon)
     ?wikipediaArticle
     ?commonsArticle
     (SAMPLE(?commonsCategory) as ?commonsCategory)
     (SAMPLE(?pic) as ?pic)
     (SAMPLE(?destroyed) as ?destroyed)
   WHERE {
     # Around given location...
     SERVICE wikibase:around {
       ?item wdt:P625 ?location.
       bd:serviceParam wikibase:center "Point(${LONG} ${LAT})"^^geo:wktLiteral.
       bd:serviceParam wikibase:radius "${RAD}" . # Radius in kilometers.
     }

     # Get the label in the preferred language of the user, or any other language if no label is available in that language.
     OPTIONAL {?item rdfs:label ?itemLabelPreferredLanguage. FILTER (lang(?itemLabelPreferredLanguage) = "${LANG}")}
     OPTIONAL {?item rdfs:label ?itemLabelAnyLanguage}

     # Get the description in the preferred language of the user, or any other language if no description is available in that language.
     OPTIONAL {?item schema:description ?itemDescriptionPreferredLanguage. FILTER (lang(?itemDescriptionPreferredLanguage) = "${LANG}")}
     OPTIONAL {?item schema:description ?itemDescriptionAnyLanguage }

     # Get Commons category (P373)
     OPTIONAL { ?item wdt:P373 ?commonsCategory. }

     # Get (P18)
     OPTIONAL { ?item wdt:P18 ?pic. }

     # Get (P576)
     OPTIONAL { ?item wdt:P576 ?destroyed. }

     # Get the class label in the preferred language of the user, or any other language if no label is available in that language.
     OPTIONAL {
       ?item p:P31/ps:P31 ?classId.
       OPTIONAL {?classId rdfs:label ?classLabelPreferredLanguage. FILTER (lang(?classLabelPreferredLanguage) = "${LANG}")}
       OPTIONAL {?classId rdfs:label ?classLabelAnyLanguage}

       OPTIONAL {
           ?wikipediaArticle   schema:about ?item ;
                               schema:isPartOf <https://${LANG}.wikipedia.org/> .
         }
       OPTIONAL {
           ?wikipediaArticle   schema:about ?item ;
                               schema:isPartOf <https://en.wikipedia.org/> .
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
         }

         OPTIONAL {
           ?commonsArticle   schema:about ?item ;
                               schema:isPartOf <https://commons.wikimedia.org/> .
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
         }
     }
   }
   GROUP BY ?item ?wikipediaArticle ?commonsArticle

Where location, label, description, class, classLabel, icon, wikipediaArticle, commonsArticle, commonsCategory, pic, destroyed are variables that we reach and use their values in NearbYResultItem.kt class. On the other hand ${LONG}, ${LAT}, ${RAD} and ${LANG} are the inputs that we send to the query. So:

1- We should notify the user about they can use these inputs (${LONG}, ${LAT}, ${RAD},${LANG}) in their query as well. 2- We should make some fields (i.e. location, label, description, class...) uneditable, from my understanding of SPARQL, we may let the WHERE block can be edited. If you confirm me I will continue to the mockup accordingly. What I mean by WHERE block is:

WHERE {
     # Around given location...
     SERVICE wikibase:around {
       ?item wdt:P625 ?location.
       bd:serviceParam wikibase:center "Point(${LONG} ${LAT})"^^geo:wktLiteral.
       bd:serviceParam wikibase:radius "${RAD}" . # Radius in kilometers.
     }

     # Get the label in the preferred language of the user, or any other language if no label is available in that language.
     OPTIONAL {?item rdfs:label ?itemLabelPreferredLanguage. FILTER (lang(?itemLabelPreferredLanguage) = "${LANG}")}
     OPTIONAL {?item rdfs:label ?itemLabelAnyLanguage}

     # Get the description in the preferred language of the user, or any other language if no description is available in that language.
     OPTIONAL {?item schema:description ?itemDescriptionPreferredLanguage. FILTER (lang(?itemDescriptionPreferredLanguage) = "${LANG}")}
     OPTIONAL {?item schema:description ?itemDescriptionAnyLanguage }

     # Get Commons category (P373)
     OPTIONAL { ?item wdt:P373 ?commonsCategory. }

     # Get (P18)
     OPTIONAL { ?item wdt:P18 ?pic. }

     # Get (P576)
     OPTIONAL { ?item wdt:P576 ?destroyed. }

     # Get the class label in the preferred language of the user, or any other language if no label is available in that language.
     OPTIONAL {
       ?item p:P31/ps:P31 ?classId.
       OPTIONAL {?classId rdfs:label ?classLabelPreferredLanguage. FILTER (lang(?classLabelPreferredLanguage) = "${LANG}")}
       OPTIONAL {?classId rdfs:label ?classLabelAnyLanguage}

       OPTIONAL {
           ?wikipediaArticle   schema:about ?item ;
                               schema:isPartOf <https://${LANG}.wikipedia.org/> .
         }
       OPTIONAL {
           ?wikipediaArticle   schema:about ?item ;
                               schema:isPartOf <https://en.wikipedia.org/> .
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
         }

         OPTIONAL {
           ?commonsArticle   schema:about ?item ;
                               schema:isPartOf <https://commons.wikimedia.org/> .
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
         }
     }
   }
nicolas-raoul commented 3 years ago

For my use case number 2 above, I need to modify the COALESCE, which is outside of WHERE. I am in favor of having just a big text area, and a tooltip that contains a "read more" link to the wiki (such tooltips with "read more" weblinks can be seen in Achievements). I can write that wiki page, it can even contain a collection of purpose-specific custom queries. This feature is for power users who are knowledgeable in SPARQL. So I would be in favor of putting on the user the responsibility of having valid SPARQL with all needed inputs/outputs.

neslihanturan commented 3 years ago

Hmm, we can let people edit COALESCE without editing "as ?label", "as ?description" parts. Them might be superusers when it comes to SPARQL but it is not only about the SPARQL, also about to know about the code in our side and we can not expect them to know the variable names that our code preferred.

nicolas-raoul commented 3 years ago

We can list the variables in the tooltip or "read more". Users start from the default query, it is not like they have a blank page. In SPARQL queries, output variables are always at the same place, and people know to not modify the output variable names. If they mess up, resetting to default is easy enough.

Having several text fields separated by as ?label/etc areas would make copy/pasting (from the wiki for instance) very cumbersome and error-prone.

neslihanturan commented 3 years ago

"Having several text fields separated by as ?label/etc areas would make copy/pasting (from the wiki for instance) very cumbersome and error-prone." hmm very true, I didn't think about this user flow. So I will continue as you said with tooltips, thanks for the inputs @nicolas-raoul !

neslihanturan commented 3 years ago

Here is my suggestion, I am happy so hear your feedbacks. attempt1

nicolas-raoul commented 3 years ago

Looks good to me!

A few minor things:

misaochan commented 3 years ago

Here is my suggestion, I am happy so hear your feedbacks.

Looks good to me @neslihanturan . :) @nicolas-raoul 's changes sound good as well.

Where location, label, description, class, classLabel, icon, wikipediaArticle, commonsArticle, commonsCategory, pic, destroyed are variables that we reach and use their values in NearbYResultItem.kt class

I think that it might be beneficial to check that these variables all exist when the user taps "Apply" - if any of them don't exist, show an error dialog and do not proceed. Otherwise, for instance if a user removes commonsCategory, they could get a nice map with all the pins shown, but the app will probably crash at the final stage of a Nearby upload since we require a check for that variable. Given that the events are so far apart, they may not connect the crash to their SPARQL modification, and file a crash report that we will be scratching our heads over for months (especially if they don't mention in the crash report that they made a modification).

nicolas-raoul commented 3 years ago

Checking that the request's String contains the String ?wikipediaArticle (etc) before the WHERE can be worth checking indeed, even if that does not guarantee that wikipediaArticle will actually be an output, even less that it is in the right format (for instance it could contain a number).

I would recommend against checking for the String as ?wikipediaArticle, because someone who reads only English might decide to make their SPARQL faster by removing the whole localization fallback mechanism, removing the need for SAMPLE [...] AS part.

misaochan commented 3 years ago

Makes sense. I wonder if it's possible to make a note in ACRA if a modified query is being used in Nearby?

neslihanturan commented 3 years ago

Here are mockups after reviews, I hope all good: attempt2

nicolas-raoul commented 3 years ago

Looks perfect to me!

Maybe "Advanced" is enough, instead of "Advanced options"?

misaochan commented 3 years ago

Looks great to me. :) I think "Advanced" would look nicer with the UI, but "Advanced Options" seems clearer. I'm OK with either one.

neslihanturan commented 3 years ago

I also think that Advanced Options is clearer, but I a okay whichever you wish.

nicolas-raoul commented 3 years ago

Sorry for nitpicking, let's go with "Advanced Options" since you both think it is clearer :-)

ashishkumar468 commented 3 years ago

Hi @misaochan , question around this

  1. Will the radius expander we apply implicitly around the nearby query be applicable even with the advanced query?
nicolas-raoul commented 3 years ago

Hi @ashishkumar468

I think most people will not modify the ${RAD} part, in which case no special code is needed.

But if we want to take care of those who modify it, I can see two cases that could be considered as special:

ashishkumar468 commented 3 years ago

Thanks for the input @nicolas-raoul , this should work : )

misaochan commented 3 years ago

Agreed, handling those cases should be sufficient. :) IMO the basic premise of this feature is that it's only for advanced users, and anything the user changes in that text field can come with a potential of rendering Nearby unusable for them. (Which is why I think, as I mentioned at https://github.com/commons-app/apps-android-commons/issues/3410#issuecomment-843804667 , that we should add a note to the ACRA crash report and Send Logs that a custom SPARQL query was used, so that it doesn't raise false alarms if such errors are reported.)