Kitware / UPennContrast

UPenn ?
https://upenn-contrast.netlify.com/
Apache License 2.0
8 stars 6 forks source link

Problem with new file search bar #501

Open arjunrajlab opened 1 year ago

arjunrajlab commented 1 year ago

Prefix search seems to work fine, but text search seems broken @bruyeret. See screenshot below. Somehow it is not finding the MS2 file that is in the directory (also visible in screenshot).

Not sure what happened, I had tested it before and it worked, but after the merge it seems to have stopped working.

image
bruyeret commented 1 year ago

This seems to come from MongoDB. When I open a MongoDB playground and do:

use('girder');
db.getCollection('folder').find({ $text: { $search: "squar" } });

I get a result with "square" in its name.

But when I replace "squar" with "squa", I don't get a result...

arjunrajlab commented 1 year ago

Oh interesting. We can just leave it, not a big deal.

bruyeret commented 1 year ago

I can maybe improve the UI?

arjunrajlab commented 1 year ago

Sure!

bruyeret commented 1 year ago

After more investigation, it is not because of a minimum number of characters, but because of the way text searches work. Mongo uses text indexes as explained here. Do you think there is a good way to change the UI?

arjunrajlab commented 1 year ago

I see! I think it should just say "Need more letters to match" if we are under the limit instead of "no results match query", which is confusing. Also, it might be good, if the "text match" option is selected, to have some dummy text in the search field that says Type at least 4 characters just to prompt the user.

bruyeret commented 1 year ago

The issue is not the number of letter For example, if I have a folder named "Many annotations", the search doesn't show anything for "annotat" but it finds the folder when I type "annot", "annotate", "annotation" or "annotations" Another example: to find "square", I can type "squared" or "squares", but to find "new", I can't search "news" That is why the language in the settings (default_language) is important as explained in the link I sent

arjunrajlab commented 1 year ago

Oh sorry, I didn't read that carefully! Hmm. I don't know what to make of that. The stems that we would be searching for may not be English at all, and will often be multiple words put together. Perhaps the best thing is to just drop the full text search for now? I don't know of a good way to explain this sort of strange behavior to the user. It seems strange to me that there is no substring search in Mongo, but whatever.

arjunrajlab commented 1 year ago

It seems you can do it with regex, but perhaps it is inefficient for very large collections? I don't think that would matter too much for us here, though.

https://stackoverflow.com/questions/10242501/how-to-find-a-substring-in-a-field-in-mongodb

manthey commented 1 year ago

Girder has two search modes by default; "text" uses the mongo text search, "prefix" has to match the beginning of the name of the thing being searched. It is intended to be extensible, so we could always create a new girder search mode that would be an arbitrary substring match (e.g., in Mongo a {$regex: } query on the name -- it wouldn't be as fast for huge collections.

arjunrajlab commented 1 year ago

I think let's just drop the text search for now unless it's a quick implementation. I think a regex would be fine given that we don't have a lot of items, but if it's going to be a fair amount of work to implement (sounds somewhat involved at least), then I would say let's drop it for the time being.

bruyeret commented 1 year ago

I don't think that it is a lot work On the frontend, setting the SearchModeOptions is very straightforward (a simple vue prop) On the backend, adding the new option would look almost exaclty the same way as the current "prefix" mode (see prefixSearch in model_base.py and addSearchMode in search.py

arjunrajlab commented 1 year ago

Cool, let's do it then!

arjunrajlab commented 2 months ago

Someone has requested this feature again :). I guess looking at the above PR we would need to make a plugin for this?

bruyeret commented 2 months ago

We can create a new plugin for this endpoint, but we can also add an endpoint to the existing plugin This could be pretty resource intensive as said by Zach in the closed PR:

it would require a full table scan

arjunrajlab commented 2 months ago

Can we first subset by the datasets in the folder that is being displayed to lower the computational cost? Or current folder and all subfolders? I think for any individual user, they are not expecting to search the entire database.

bruyeret commented 2 months ago

This is possible, but we should make it clear in the UI If we want to only search the current folder, this is pretty easy as we can filter the rows in the browser instead of querying the server and opening a pop-up for the results

arjunrajlab commented 2 months ago

I think that would be great. Yes, I think in the UI, we could make it clear it's just a filter rather than a comprehensive search. I think mostly people are looking for a local filter. For the more comprehensive search, then probably the current functionality is just fine.