Aedif / TokenVariants

GNU General Public License v3.0
17 stars 12 forks source link

Bug? - underscores / case sensitivities causing unexpected matching results #85

Closed Truncated closed 1 year ago

Truncated commented 1 year ago

I'm using underscores for my file names according to Best Practice for FoundryVTT media, and I'm finding the search doesn't match as I would expect. I've tried leaving the search filters blank, and then adding dash (-), underscore (_), then later comma (,) to the exclude, as shown here: image

I also have Excluded the keywords and, for, and the, globally: image

but I'm still getting weird output, such as this: image

I'm expecting that of is dropped (below 3 letters) and the should be excluded, so then the highlighted item should match the object in question fully, but it doesn't. :(

Here's an example where it works fine: image

I think it might be that the Excluded Keywords + Search Settings --> Search Filters in Compendium Mapper are not acting cumulatively? I will try to see if I move the exclude filter settings to the global filter settings instead (which is currently blank), if I get a different experience when I get a chance.

Either this or the excluded keywords are only excluded from the item name search, but doesn't exclude it from the image file names as well?

Those are the only two logic paths that seemed like they could be true here.

I figure it's 50/50 that I have the search/filtering wrong. :) Cheers!

Aedif commented 1 year ago

Either this or the excluded keywords are only excluded from the item name search, but doesn't exclude it from the image file names as well? Bingo

'Excluded Keywords' is purely there to prevent words such as 'and,for,the' from being used in the search and returned as one of the headings you see in the search results. These words are not removed from the image names.

I think you might be misunderstanding how filters work. The module by default already ignores special characters such as underscores, commas, and dashes.

For example with the exact search algorithm with no filters setup the module would clean up both the search and the files as so:

Search: Monocle of Flawlessness -> monocleofflawlessness Image: monocle_of_flawlessness.jpg -> monocleofflawlessness

Thus we get an exact match between the two.

Filters are there to ignore files that match some sort of piece of text. For example if you setup Exclude to be 'monocle', the module will identify that 'monocle_of_flawlessness.jpg' contains 'monocle' and will not return it as a match regardless of what the search is.

Aedif commented 1 year ago

If you want Mask of Stolen Mien to match mask_of_the_stolen_mien.jpg I'd recommend you use Fuzzy search algorithm.

Exact search algorithm will do the following:

Search: Mask of Stolen Mien -> maskofstolenmien Images: mask_of_the_stolen_mien.jpg -> maskofthestolenmien

maskofstolenmien is not fully present in the image name because of the the in the middle of it so the module will not match them.

Truncated commented 1 year ago

Either this or the excluded keywords are only excluded from the item name search, but doesn't exclude it from the image file names as well? Bingo

Told ya it was 50/50. ;)

'Excluded Keywords' is purely there to prevent words such as 'and,for,the' from being used in the search and returned as one of the headings you see in the search results. These words are not removed from the image names.

Gotcha, that makes sense. I don't know that the documentation for it on https://github.com/Aedif/TokenVariants/wiki/Search-Algorithm was as clear as this, so I would recommend adding this clarity.

That said, wouldn't dropping them from the image names and results be more consistant and allow for the user to better control these kind of anomolies in the data sets? Aside from improper switching of plurals/singlar in titles, having and/for/the randomly added in text where it shouldn't be and leaving it out where it should be is super common in all kinds of title data sets. I would say the same of any word less than 3 characters as well (of, if, etc.),

By applying this excluded keywords list to the result set for comparison, you'd side step a lot of cleansing work for better results in the end. That's a huge value to users and the appropriate success percentages of the exact match. Even better when you start to add things like "minor, major, greater, lesser" for the variants of magic items and the like, if one wanted to just have the same image for all the "flavors". :)

I think you might be misunderstanding how filters work. The module by default already ignores special characters such as underscores, commas, and dashes.

I'm not sure this is 100% true, and there may be a whiff of an actual bug here that I'm seeing - what led me down this rabbit hole was noticing some matches lacking where they shouldn't have been by the logic you note and dashes/underscores were the only noticiable differences. What I added above were "deeper" examples I ran into, to better show a crossroads where all of my observations were clearer in combination. My results overall improved once I added the exclude filters.

That said, while I'm a competant test engineer, I certainly didn't make up test cases for this formally to ensure my logic evaluations were air tight. :) I'll dig at this a bit more, given your insights.

For example with the exact search algorithm with no filters setup the module would clean up both the search and the files as so:

Search: Monocle of Flawlessness -> monocleofflawlessness Image: monocle_of_flawlessness.jpg -> monocleofflawlessness

Thus we get an exact match between the two.

Filters are there to ignore files that match some sort of piece of text. For example if you setup Exclude to be 'monocle', the module will identify that 'monocle_of_flawlessness.jpg' contains 'monocle' and will not return it as a match regardless of what the search is.

This was also not what I got from reading the description on https://github.com/Aedif/TokenVariants/wiki/Search-Filters. Specifically, this was confusing:

In the example above a Portrait search results would include:

Dragon_PRT.png RavenPRT.jpg William[PRT].webm and exclude the following:

Dragon_TKN.png RavenTKN.jpg William[TKN].webm

Yet the example picture had PRT and TKN both in the Include areas... what is written led me to believe that TKN should have been in the Exclude fields...? What I quoted from you above is much clearer to me, fwiw.

Aedif commented 1 year ago

Closing the issue. Not a bug, but rather unclear documentation.