TagStudioDev / TagStudio

A User-Focused Photo & File Management System
https://docs.tagstud.io/
GNU General Public License v3.0
5.23k stars 374 forks source link

[Feature Request]: Add exclusive And/Or search options #314

Open coolesding opened 4 months ago

coolesding commented 4 months ago

Checklist

Description

Currently, the search can only be set to either And (Includes all Tags) and Or (Includes any Tag). I just had the problem of trying to find an image that has two tags, but no others, which seemingly I have to do manually searching for all the tags I want and filtering the ones I don't want myself.

Solution

I think adding the options Ex. Or (Exclusively includes any Tags) Ex. And (Exclusively includes all Tags) would be great additions to the versatility of Tagstudio!

EXAMPLE

I have a database of many images from and about the bocchi the rock anime, which among others includes the tags:

Kita Bocchi Nijika Ryou

There are all combinations of images tagged, some with only one, some with multiple, some with all characters tagged.

Searching for "Kita, Bocchi" in Exclusive Or mode would result in all images that only have the kita tag, only have the bocchi tag or only have the bocchi and kita tags, and no others.

Searching for "Nijika, Ryou" in Exclusive And mode would result in all images that only have the Nijika and the Ryou tags, and no others.

Alternatives

The naming "Ex. Or" and "Ex. And" could probably be improved, but I can't think of a better solution currently (In my defense, i am writing this at 1 in the morning after having a horrible sleep)

KillyMXI commented 4 months ago

This is related to #112 (as a part of larger discussion about search queries) Also useful as a confirmation of user demand for more Boolean operators.

Exclusive OR is commonly written as XOR: https://en.wikipedia.org/wiki/Exclusive_or There is no such thing as Exclusive AND, but there are such operators as XNOR, NAND.

Truth tables:

A B AND OR XOR XNOR NAND NOR
samuellieberman commented 3 months ago

This is related to #112 (as a part of larger discussion about search queries) Also useful as a confirmation of user demand for more Boolean operators.

Exclusive OR is commonly written as XOR: https://en.wikipedia.org/wiki/Exclusive_or There is no such thing as Exclusive AND, but there are such operators as XNOR, NAND.

Truth tables: A B AND OR XOR XNOR NAND NOR ❌ ❌ ❌ ❌ ❌ ✔ ✔ ✔ ❌ ✔ ❌ ✔ ✔ ❌ ✔ ❌ ✔ ❌ ❌ ✔ ✔ ❌ ✔ ❌ ✔ ✔ ✔ ✔ ❌ ✔ ❌ ❌

@KillyMXI, I don't believe that @coolesding was referring to exclusive OR the boolean operation. As I understand it, Coolesding was hoping to exclude entries from their searches without explicitly typing out the tags they want to exclude. As I understand it, the premise is that if the only tags Coolesding explicitly types out are Kita and Bocchi, then Coolesding doesn't want entries tagged Nijika or Ryou appearing in the search.

Personally, I strongly believe that Coolesding should be able to make the library work that way, but I really dislike the idea of adding features that only work if you only have a single category of tags. Coolesding ought to be able to use non-character tags without breaking the search. For example, if Coolesding adds a "text" tag to some of the entries, then there will be no way to include both entries with text and entries without text in an "Ex. And" search.

What I would suggest is the solution is adding tags to each file for the number of characters. Eg. 1_character, 2_characters, 3_or_more_characters... This is actually something I do in my own library. Then Coolesding can perform the example "Ex. And" search with this: Nijika Ryou 2_characters Unfortunately I can't think of a concise way of getting "Ex. Or" to work, even if full boolean syntax is implemented. With the example given at least, boolean syntax would allow the "Ex. Or" search to be done with this: ( Kita OR Bocchi AND 1_character ) OR ( Kita AND Bocchi AND 2_characters ) But with three characters... ( Kita OR Bocchi OR Nijika AND 1_character ) OR ( Kita AND Bocchi AND 2_characters ) OR ( Kita AND Nijika AND 2_characters ) OR ( Bocchi AND Nijika AND 2_characters) OR ( Kita AND bocchi AND Nijika AND 3_characters ) Depending on the number of character tags, it would probably be easier to just manually exclude every other character. Eg. Kita OR Bocchi OR Nijika AND NOT ( Ryou OR Gotou_Futari OR Gotou_Michiyo OR Goutou_Naoki OR [...] OR untagged_characters ) Or depending on the number of entries in the library, to just perform a preliminary search, and ignore any entries that aren't relevant. Kita OR Bocchi OR Nijika AND ( 1_character OR 2_characters OR 3_characters )

Does anyone else have any thoughts on this issue?

KillyMXI commented 3 months ago

Right, I misinterpreted the issue text because I had strong and different interpretation of those terms in my head.

What booru-like systems such as Hydrus can offer to allow a search query like this:

Number of tags within a certain namespace is tricky though. 2_characters is definitely a working workaround, but it might get annoying to maintain. There are certain problems in boorus with those numbering tags...

If we ignore the possibility of other tags, examples from OP can potentially look like this:

(Kita OR Bocchi) AND system:number_of_tags<=2 Nijika AND Ryou AND system:number_of_tags=2

But that's not a very practical assumption - it's natural to expect more tags besides characters.

Limiting the number of tags within a namespace or wildcard can be an interesting design challenge. My momentary thought is that the Set Theory might be helpful alongside the Boolean algebra to describe this. I'll try explain it later, along with some other suggestions I had previously and relevant to this.

samuellieberman commented 3 months ago

That's really interesting @KillyMXI. CyanVoxel actuallly has tag categories as a planned feature: https://github.com/TagStudioDev/TagStudio/blob/main/doc/library/tag_categories.md I don't know how namespaces work in Hydrus, but the concept of tag categories may be similar.

Also, your first example of (Kita OR Bocchi) AND system:number_of_tags<=2 doesn't do exactly what Coolesding asked for, since an entry with Kita and Nijika would match that search as well. Though that's not a unique problem. If you search for black_clothes shirt in a different library then that will match entries with black shirts, but it will also match non-black shirts if there are black clothes elsewhere in the image. There isn't really a solution besides creating tags for every possible combination, doing hardcore boolean reasoning, or just ignoring irrelevant entries with one's own mind.

KillyMXI commented 3 months ago

Dang, I goofed twice in one thread...

So, within the same constraints, the first example can be fixed like this: ((Kita OR Bocchi) AND system:number_of_tags=1) OR ((Kita AND Bocchi) AND system:number_of_tags=2)

I think this creates a stronger case for Set Theory. I'm not aware of it being used the same way, so it might become a strong competitive advantage for TagStudio. But this also means low familiarity and the necessity to invent the syntax for it.

The OP examples can be formulated as following: the set of file tags is a subset of {Kita, Bocchi} the set of file tags is equal to {Nijika, Ryou}

This can then be improved by limiting to character tags: the set of file tags in character namespace is a subset of {character:Kita, character:Bocchi} the set of file tags in character namespace is equal to {character:Nijika, character:Ryou}

To make this possible, few features needed:

What syntax can look like:

Our examples may look like this: {all_tags} in {Kita, Bocchi} {all_tags} = {Nijika, Ryou} {character:*} in {character:Kita, character:Bocchi} {character:*} is {character:Nijika, character:Ryou}

And I overlooked one more thing: Empty set (no tags) should be in any other set, but it is often not practical. Here, it will also match files without tags. Can be fixed in query like this:

{all_tags} in {Kita, Bocchi} and {all_tags} != {} {character:*} in {character:Kita, character:Bocchi} and {character:*} != {}

But this will be a common inconvenience. Empty set might be handy in different situations, and prohibiting it also makes the system unsound, so I don't think it is an option. Instead, it might be practical to introduce some kind of shorthand for non-emptiness of a queried set.

Definitions of proper (strict) subset/superset does not fit this issue exactly - they work at the wrong end of it.

What is needed are variations on subset/superset operator:

Asked ChatGPT whether there is a common notation for this, there seems to be none, and ChatGPT suggests introducing custom notation, so:

{all_tags} in! {Kita, Bocchi} {character:*} in! {character:Kita, character:Bocchi}

This is probably most unambiguous way to introduce the non-emptiness clause at the right place. I've no idea what separate single English words can be used instead and be clean about the distinction. This assumes there is no conflict with proper (strict) subset/superset. Even if they are not needed, may be worth to think how they might be distinguished. Maybe p_in, p_includes, or using different suffix symbols for non-emptiness and strictness.

Not really considering {A} < {B}, {A} <= {B}, {A} >= {B} and {A} > {B}, since it might be confusing what is being compared. Size comparison is more expected, so can't repurpose the same symbols.

Attaching non-emptiness condition to queried set rather than operator will create different problems, it doesn't have good behavior there.


I can't comment on Tag Categories. One sentence description gives me no understanding, without also being an active user of TagStudio currently.