Open sivaraam opened 4 years ago
One such usage seems to be for sugesting categories based on the titles of the images.
Yes currently this is one of the ways for suggesting categories but there are few other types of suggestions ie based on location of image, previously used categories etc which are combined with this result. Personally I find the suggestions useful as otherwise I won't know what to search. For eg. if I am uploading an image with title as Madurai temple, it suggests me multiple categories that match this name.
If the issue is related to it not giving best results when the user doesn't type in case sensitive manner then IMO removing the option altogether isn't a good option. I assume more experienced users are well versed with how Commons search works and would be used to searching accordingly.
Our goal should be to simply make our search results consistent with web search results.
One such usage seems to be for sugesting categories based on the titles of the images.
Yes currently this is one of the ways for suggesting categories but there are few other types of suggestions ie based on location of image, previously used categories etc which are combined with this result.
I realise that and to be clear I'm not against the other suggestions. Just about the suggestions via title.
Personally I find the suggestions useful as otherwise I won't know what to search. For eg. if I am uploading an image with title as Madurai temple, it suggests me multiple categories that match this name.
I understand the convenience here. But this comes with a caveat which I describe below.
If the issue is related to it not giving best results when the user doesn't type in case sensitive manner then IMO removing the option altogether isn't a good option. I assume more experienced users are well versed with how Commons search works and would be used to searching accordingly.
The issue is not that the search
API behaves case-sensitively. Moreover, the search
API actually works case-insensitively. Here's an example API query to prove that:
Note that gsrsearch=intitle:covid
(covid - in small cases) which is the query. Observe that the results have pages that have COVID
(covid - in all caps). So, the search is case-insensitive.
The actual problem is how the search
API works. It just searches the wiki pages that exist in the Category:
namespace (gsrnamespace=14
). But the fact is that the actual set of categories isn't the set of wiki pages that exist in the Category:
namespace.
Is it not possible to use the article search engine with (invisible) category: prefix instead ? Wouldn't that search for pages in the category namespace, rather than actual categories? Some categories don't have associated pages, and you can create pages in the category namespace for non-existent categories.
[Source]
See also: https://github.com/commons-app/apps-android-commons/issues/3179#issuecomment-612052320
Our goal should be to simply make our search results consistent with web search results.
Well, if we're being serious about category addition here's we shouldn't be using the search
API for category addition for the reason I describe in my previous comment. The allcategories
API seems to be only one that does the job properly. Do enlighten me if I'm ignorant of some other magical API which is a lot better than allcategories
for category search.
Also, if you give the category addition interface of Visual editor a shot you'll realise that it seems to be using allcategories
API too. The phab ticket T59302 is all about showing case insensitive category suggestions in the Visual editor and guess what, it's still open.
@sivaraam I didn't realize that this issue was about using some other API for title category suggestion.
As far as I can think, that's hardly useful. I think it's best if we don't do such a category search based on image title at all.
This comment of yours confused me.
Am all in for using some other API if it gives better results.
@sivaraam I didn't realize that this issue was about using some other API for title category suggestion.
Apologies for the lack of clarity. I really thought I clarified that in the first paragraph.
Am all in for using some other API if it gives better results.
The problem is: there isn't! The search
API gives nice results but it isn't suited category search. The allcategories
API only supports a case-sensitive prefix search, AFAIK. So, sending the title to it is not a great idea as won't get better results if we get any results at all. That's the reason I suggested removing the category suggestions using the title altogether.
I also had a look at the other usage of the search
API for category search. IIUC, it comes into picture in the "Explore" screen when searching for categories in the "Categories" tab. I wonder what we could do about this. The search
API has the limitation I describe in my previous comment. If we instead use allcategories
API for the category search it means we would be doing a case-sensitive prefix search which would not give great results. But that's our only option, AFAIK.
Please share thoughts on these.
It has been hard to follow all the discussion on category search
The allcategories API only supports a case-sensitive prefix search
I may be partly remembering but does this mean only the first letter is case sensitive in the search? Is there any solution of multiple request we combine and filter out the distinct categories?
The allcategories API only supports a case-sensitive prefix search
I may be partly remembering but does this mean only the first letter is case sensitive in the search?
Nope. There are a couple of things:
allcategories
API does a case-sensitive prefix search. By this I mean that a search term sent to that API does a case-sensitive prefix match of the category titles. For example, if I send the search term foo
to the API:
* Foo bar
* Foo club Factory
* Foo BEACH
... but I would not get the following results:
* FOO bar
* Bar foo
Hope that clarifies your doubt.
Is there any solution of multiple request we combine and filter out the distinct categories?
That's not a good idea even in theory. To properly simulate a case-insensitive search via multiple queries we would have to form all combination of cases of the characters in the search term. So, the number of queries would grow exponential w.r.t the number of characters in the search term. For example, consider that the search term is foo
and we to simulate a case-insensitive search using an API that only supports a case-sensitive search. Then we would have to send a query to the API for all of the following words as the search term and then combine the results and de-duplicate them.
* foo
* foO
* fOo
* fOO
* Foo
* FoO
* FOo
* FOO
I think that gives you an idea about why this is not possible.
Yeah but how many categories HaVe A cASe LiKe saRCaStic SPongeBOB?
I bet all lower case, originally Typed case, ALL CAPS, Capital Case would get us 99% of results, enough to trick users anyhow.
This is for sure a bandaid but do we have a better solution? Or do we just do nothing and close this ticket and wait for an api that supports this?
Yeah but how many categories HaVe A cASe LiKe saRCaStic SPongeBOB?
I bet all lower case, originally Typed case, ALL CAPS, Capital Case would get us 99% of results, enough to trick users anyhow.
Oh, I wouldn't bet on that. Particularly given all the interesting categories titles that you could find in Special:Categories.
Also, I feel that this trick would give a false sense of case-insensitivity to the users making them wonder why the search seems to behaving case insensitive in some cases and case-sensitive in others. To give a real world example, consider the following:
(image courtesy: https://github.com/commons-app/apps-android-commons/issues/3582#issuecomment-603744071)
Think about what would happen when the search word is "flowers in a". I don't think we can properly manipulate that search word in a way that our case sensitive API would return the categories "Flowers in Ain", "Flowers in Angus", etc. would be returned. This is just an example. A lot of this cases would happen when we don't a proper simulation of the case-insensitive search using the case-sensitive API. These cases make the user wonder if the search is really case-insensitive or not. A proper simulation would be costly, though. So, it's best if we spare them the confusion and just say that the category search is limited to a case-sensitive one until we find a proper solution for this.
This is for sure a bandaid but do we have a better solution? Or do we just do nothing and close this ticket and wait for an api that supports this?
For the reasons I mentioned above and others I mention in a comment in #3179, I think this our best way forward. I others have better ideas, please share them.
@sivaraam I am not able to follow your final suggestion.
From what I understand, the title based category suggestion are not providing the best results. If this might confuse the users we can add a (i) button explaining how the suggestions are fetched. Apart from that, I don't think we can do much.
@sivaraam I am not able to follow your final suggestion.
Apologies for not being clear. My last couple of comments apply mostly to #3179.
From what I understand, the title based category suggestion are not providing the best results. If this might confuse the users we can add a (i) button explaining how the suggestions are fetched. Apart from that, I don't think we can do much.
I'm not sure about the lack of clarity w.r.t how we suggest categories. The concern I have with the suggesting categories based on title is:
search
API for showing the suggestions. We should actually be using the allcategories
API for the reasons I mention above and in #3179.allcategories
API does a case-sensitive prefix search. So, using it to show category suggestions based on the file title wouldn't be a great idea as we would hardly get any results.So, I suggest that we remove the category based title suggestions altogether. Hope that clears your confusion.
As already discussed above, it is better to show some category based suggestions rather than removing it altogether.
Tagging @misaochan for her opinions.
If we conclude that the allcategories
API provides better results, I am OK with switching to that API. However, I don't see why that would require removing title-based categories. Can we not just query allcategories
for our other suggestions, query search
for title-based categories, and concat the results?
If this is not doable, I feel that even providing a few title-based suggestions (even if they are case-sensitive and therefore not numerous) is better than not displaying any at all.
We seem to be using the
search
API for category search in a couple of places in the app. As mentioned in the comments in #3179 [ref 1] [ref 2], it has some problems and is not the apporpriate API for category search. So, it's best to evaluate the usage of that API in the app and see if we could find proper replacements.One such usage seems to be for sugesting categories based on the titles of the images. As far as I can think, that's hardly useful. I think it's best if we don't do such a category search based on image title at all.
I'm not sure about the other usage, though.