TagStudioDev / TagStudio

A User-Focused Photo & File Management System
https://docs.tagstud.io/
GNU General Public License v3.0
5.03k stars 364 forks source link

[Bug]: certain parent and grandparent tags not showing up in searches #308

Open Treewad opened 3 months ago

Treewad commented 3 months ago

Checklist

TagStudio Version

Alpha 9.3.1

Operating System & Version

Windows 11

Description

I was experimenting with tagging some of my saved digital art when I noticed some very strange and inconsistent interactions with certain parent tags and the search function. I'll use specific examples to make sure I don't miss any details.

First, I created a tag for the character Princess Zelda. I named it _%princesszelda and gave it the shorthand zelda and the alias princess zelda. Then I gave it the parent tag _$tears_of_thekingdom, which has the shorthand totk and the alias tears of the kingdom. Then I gave the _$tears_of_thekingdom tag the parent tag of _$the_legend_ofzelda, which has the shorthand tloz and the alias the legend of zelda. Then I tagged an image with only the _%princesszelda tag. When I search princess zelda or zelda, the image shows up. But when I search _%princesszelda, it doesn't, meaning only the shorthand and the aliases of the tag are showing up in the search, and not the tag itself. When I search totk, the image shows up, but when I search tears of the kingdom or _$tears_of_thekingdom, it doesn't, meaning only the shorthand of the parent tag is showing up in the search. When I search tloz or the legend of zelda, the image shows up, but when I search _$the_legend_ofzelda it doesn't, meaning only the shorthand and the aliases of the grandparent tag are showing up in the search.

However, this behavior differs slightly from the behavior of a different image that I tagged with a similar structure. The image is tagged only with %gemma, which has the shorthand gemma. %gemma has the parent tag _$monster_hunterwilds, which has the shorthand mhwilds and the alias _monster_hunterwilds. _$monster_hunterwilds has the parent tag _$monsterhunter, which has the shorthand monster hunter. When I search gemma the image shows up, but it also shows up when I search %gemma, which is different from the first example, where searching the exact tag name did not return the image. When I search mhwilds the image shows up, but not when I search monster hunter wilds or _$monster_hunterwilds, which is identical behavior to the first example, where only the shorthand returned the image. But when I search monster hunter or _$monsterhunter, the image doesn't show up, which is different from the first example where searching the shorthand and aliases of the grandparent tag did return the image.

For good measure, I tagged a third image to see if I could figure out any sort of pattern. The third image is tagged only with %marina, which has the shorthand marina. %marina has the parent tag _$splatoon2, which has the shorthand _splatoon2. _$splatoon2 has the parent tag $splatoon, which has the shorthand splatoon. When I search %marina or marina, the image shows up, which is the same as the second example. When I search _splatoon2 the image shows up, but not when I search _$splatoon2, which is the same as both examples. When I search splatoon or $splatoon, the image shows up, which is different from both examples because searching the exact name of the grandparent tag did not return the image in the other examples.

So in summary: FIRST EXAMPLE Tags (_%princesszelda, zelda, princess zelda): shorthand and alias return image. (2/3) Parent Tags (_$tears_of_thekingdom, totk, tears of the kingdom): only shorthand returns image. (1/3) Grandparent Tags (_$the_legend_ofzelda, tloz, the legend of zelda): shorthand and alias return image. (2/3)

SECOND EXAMPLE Tags (%gemma, gemma): exact name and shorthand return image. (2/2) Parent Tags (_$monster_hunterwilds, mhwilds, monster hunter wilds): only shorthand returns image. (1/3) Grandparent Tags (_$monsterhunter, monster hunter): nothing returns image. (0/2)

THIRD EXAMPLE Tags (%marina, marina): exact name and shorthand return image. (2/2) Parent Tags (_$splatoon2, splatoon 2): only shorthand returns image. (1/2) Grandparent Tags ($splatoon, splatoon): exact name and shorthand return image. (2/2)

Obviously this is a very small dataset, so take my analysis with a grain of salt, but here are the patterns I was able to identify: -Tags with underscores never return the image. -The only instances where searching the exact name of the tag returned the image was with tags that are one word and have no underscores. -Only the shorthand will return the image when searching parent tags.

The main thing that baffles me is how inconsistent the aliases with spaces are. I just have no idea why searching the legend of zelda works, but tears of the kingdom and monster hunter wilds and monster hunter don't. I'm not very familiar with coding, so I don't know what might be causing this, but hopefully this information is helpful for whenever the search function gets revamped.

Expected Behavior

As for the expected behavior, I would expect an image to show up in the search if you search any of its tags, parent tags, or grandparent tags, etc, or if you search any of the shorthands/aliases of those tags, parent tags, or grandparent tags.

Steps to Reproduce

For the steps to reproduce, just create a few tags with parent and grandparent tags and give them shorthands and/or aliases. Maybe experiment with including symbols, underscores, and spaces. Then just tag some images with only the base tags and see whether they show up when you search certain parent tags and aliases.

Logs

No response

CyanVoxel commented 3 months ago

First off thank you so much for the in-depth explanation, this will incredibly helpful for narrowing this down 😁

Some of this seems like it can be attributed to https://github.com/TagStudioDev/TagStudio/issues/112, wherein the search query is split along spaces. So searching for "monster hunter wilds" will be sent as three distinct queries, being "monster", "hunter", and "wilds". This is something that will be fixed down the line, but has slipped by for the time being.

As for some of the other cases, hmm... In your second example, I would definitely expect the grandparent "$monster_hunter" to return the "%gemma" tag. Likewise in your third example I would also expect "$splatoon_2" to return "%marina". Given that the grandparent tag in your third example somehow works as intended, my guess it that something might be wrong with how underscores are checked in the parser when it comes to tag relations. Behind the scenes, all characters such as underscores, hyphens, spaces, and apostophies are ignored while parsing (this is also responsible for the space bug).

I'll try to replicate this and further narrow in on the cause. Thank you again for the detailed bug report!

Thesacraft commented 2 months ago

Hi I looked into this and I think I know why the bug is happening and CyanVoxel was on the right path :

It is partly due to the space bug, but that for example $tears_of_the_kingdom doesn't show the image taged with %princess_zelda is due to the parser ignoring underscores etc.. This is a problem because the search query is striped of punctuation ( including underscores ) and then the algorithm tries to find the matching tag to the query. For example the search query $tears_of_the_kingdom becomes $tearsofthekingdom but the tag name still is $tears_of_the_kingdom which means that the search is unsucsessfull.

An easy fix would be to strip the punctuation of tags when they are created but this would also be displayed to the user ( meaning a tag created with the name $tears_of_the_kingdom would have the name $tearsofthekingdom after creation). I think using a striped version for the search part and displaying a non striped version to the user would be a bad idea because it could make it confusing to use and could end up making the code very confusing.

Theoretically #310 also removes this (and the space bug) but it is achieved by not stripping any punctuation at all. Personally I would remove the stripping of underscores regardless of the final solution for the rest because I think it makes it easier to read the tags if you can use underscores to separate words.