HabitRPG / habitica

A habit tracker app which treats your goals like a Role Playing Game.
https://habitica.com
Other
11.95k stars 4.08k forks source link

Search keywords no longer restricting to challenges with that keyword #10242

Open shanaqui opened 6 years ago

shanaqui commented 6 years ago

From Report a Bug:

Hello, something has changed in the Challenge search. I used to be able to type a phrase and only challenges with that exact phrase would appear. Now, other challenges without those words appear as well. For example, when I type "pom party" without quotes, I get "pom party" challenges alongside "You Are Poisoned ! : Easy Edition". And if I enter "pom party" with quotes, I get ":sound: Podcadoro Party (Explanation)" but not challenges with "pom party" in the text."

I get nothing if I enter "pom party" with quotes, and if I do e.g. "books" I get challenges where "books" isn't even included anywhere in the description or even tasks._

Dewines commented 6 years ago

Thanks for raising that. Is this a problem caused by the new "Load More" button? I've started having the same problem. I have a private challenge in my solo party which I use to add a bunch of weekly To-Dos so I don't have to recreate them from scratch each time. It's called "Dewines's weekly household tasks". I used to just go to Discover Challenges and put "Dewines" in the search box and it would come up. Now I get pages of "Daily focus finder" challenges and have to click Load More a couple of times before the one I want appears.

Alys commented 6 years ago

The current search is no longer useful. For example, you can search for an exact phrase from the title of a challenge and it can take numerous clicks of "Load More" for the challenge to appear. Many users won't realise they need to do this and so will never find the challenges they're looking for.

I think we need an API route that searches Challenge titles and returns all of the results at once, or at least more than 10 - it's agonising to step through them 10 at a time, especially with the expanded format that makes it hard to scan the titles rapidly.

I'll leave this as suggestion-discussion for a couple of days and then if there's no objection, I'll add an edit to the top post to describe that as the desired fix.

TheHollidayInn commented 6 years ago

I don't think we need a new api route. The solution should be to modify the current query to make titles more focused. We can try removing description for example. But whoever works on this should look into mongo query search.

shanaqui commented 6 years ago

Note: I just searched for a challenge (Read the World April) with an exact phrase from the title ("read the world"). It took 11-12 "load more"s to find it, and returned things where literally the only matching text in the challenge was the word "the" and, possibly, "already", if it was matching "read" because it's in "already".

I'd be interested to see what removing "the" (and other common words) and partial matches (e.g. already/read) would do to help.

TheHollidayInn commented 6 years ago

Removing stop words could help too, yea.

But if we are assuming they are searching title and removed the description query, I think that would have solved the problem too.

Alys commented 6 years ago

I've been experimenting on my local install with mongodb's built-in text indexes, which automatically use stop words (ignore "the", etc) and stemming (search for "dogs" and find "dog"), and which allows sorting the results by score. Weights can be added to the index if more than one field is indexed. Based on my tests on a limited set of challenges, it seems promising and easy to implement.

Documentation here: https://docs.mongodb.com/manual/core/index-text/ (note that's for 3.6; I couldn't easily find docs for 3.4 but that page doesn't indicate any differences between 3.4 and 3.6). A simple non-official tutorial: is here https://code.tutsplus.com/tutorials/full-text-search-in-mongodb--cms-24835

To allow searching on challenge's names and summaries, we'd create an index like this: db.challenges.createIndex({name:"text",summary:"text"},{"weights":{name:3,summary:1}}) which would produce this:

{
    "v" : 2,
    "key" : {
        "_fts" : "text",
        "_ftsx" : 1
    },
    "name" : "name_text_summary_text",
    "ns" : "habitrpg.challenges",
    "weights" : {
        "name" : 3,
        "summary" : 1
    },
    "default_language" : "english",
    "language_override" : "language",
    "textIndexVersion" : 3
}

That weights the challenge names three times more than the summaries. We could play around with that number. My feeling in general is that searching on both with names highly weighted will produce the best results.

Search queries are like this: db.challenges.find({$text: {$search: "keywords here"}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}})

If we wanted to implement a phrase search, we'd do that like this: db.challenges.find({$text: {$search: "\"my phrase goes here\""}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}}) but I don't think a phrase search would be necessary because the scoring of keywords would allow exact-phrase matches to be near the top of the results anyway (confirmed in my local install tests), and my opinion (not based on any investigation) is that most searches are more likely to be for keywords than for phrases.

I think it's worth looking into this further. Perhaps we can set up a text index on the beta site and push a code change there? That would let us do real-life searches on the beta database which is an old-ish version of production and would give us an idea of whether it's working. In fact, we could simply add the index to the beta database and then use direct mongodb commands from a local script to do some initial testing, with no need to push any code changes to the beta website. If there's no objections from the staff in a few days, I'll create the index on the beta database and continue my testing there. @paglias @TheHollidayInn @SabreCat

If this works for challenges, it could also be used for guilds: https://github.com/HabitRPG/habitica/issues/9755

paglias commented 6 years ago

That looks good to me @Alys , the only thing is that I remember we're already using a text index but maybe not the $text query? cc @TheHollidayInn

Alys commented 6 years ago

Ah yes, in the prod database, we have this text index for groups (no text index for challenges):

    {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "name_text_description_text_summary_text",
        "ns" : "habitica.groups",
        "background" : true,
        "weights" : {
            "description" : 1,
            "name" : 1,
            "summary" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 3
    }

That groups index wasn't on beta but I've now done these two commands on the beta database (took just a few seconds each to run btw): db.challenges.createIndex({name:"text",summary:"text"},{"weights":{name:3,summary:1}}) db.groups.createIndex({name:"text",summary:"text"},{"weights":{name:3,summary:1}})

Doing command-line searches on the beta database and comparing the results to searches in the beta website shows that using these indexes is a great improvement (e.g., for groups, a command-line search for "report a bug" actually puts the Report a Bug guild at the top of the results), so I think we should try to incorporate a $text search into the guild and challenge filter tool, instead of the current keyword search.

paglias commented 6 years ago

The looks good to me, are you going to create a PR with the query changes? cc @TheHollidayInn

Alys commented 6 years ago

I'm hoping to have time to play around with the code in the next day or two, but I'm not certain enough to claim the issue yet. :) It should stay marked as help wanted in case someone beats me to it.

Alys commented 6 years ago

So this would be my proposed change if I did this. I'm still not necessarily claiming this issue, just getting it a step closer. :) Does this seem right?

Replace these lines: https://github.com/HabitRPG/habitica/blob/d4d668f640afea0b1fb09bf2a3c1cc6425e49e76/website/server/controllers/api-v3/challenges.js#L384-L389 with code based on this: {$text: {$search: "search terms"}}, {score: {$meta: "textScore"}} (still using $and to keep the other filter options of course)

And replace this: https://github.com/HabitRPG/habitica/blob/d4d668f640afea0b1fb09bf2a3c1cc6425e49e76/website/server/controllers/api-v3/challenges.js#L398 with: .sort({score:{$meta:"textScore"}}) if and only if req.query.search is defined (otherwise leave the sort as it is now).

And replace this code: https://github.com/HabitRPG/habitica/blob/d4d668f640afea0b1fb09bf2a3c1cc6425e49e76/website/client/components/challenges/myChallenges.vue#L132-L135 with a call to api.getUserChallenges

paglias commented 6 years ago

yeah that seems the right path

Dewines commented 6 years ago

Um... I have an admission to make. I have just realised today that the reason the Daily Focus Finder challenges come up when I search on my name is that, unbeknownst to me, @blakejones99 had mentioned my name in the description. That doesn't explain Shanaqui's problem, but I'm sorry if you've been on a wild goose chase over me not being able to bring up my challenge right away when I searched on "Dewines".

Alys commented 6 years ago

@Dewines no worries but thanks for letting us know. :) When I was testing, I was using other challenges so that didn't make any difference. The search definitely can be improved.

paglias commented 6 years ago

Should these changes be applied to groups search as well?

Alys commented 6 years ago

@paglias Yes. This is the issue for it: https://github.com/HabitRPG/habitica/issues/9755

bigsee commented 4 years ago

I'm going to take a look at this, per the suggestion from @paglias here. I'll shout if I get stuck. :)

Alys commented 4 years ago

@bigsee Thanks! I've marked it as in progress for you

bigsee commented 4 years ago

Quick update: just got back from holidays. I'm still on this. Will likely be spending some more time this weekend.

shanaqui commented 4 years ago

@bigsee Hi! Still planning on working on this one? No hurry, we just check in to make sure that issues don't go stale. Please let us know within a week if you'd like to keep working on this, or we'll put it back in the queue (but you can always pick it up again in future). :)

shanaqui commented 4 years ago

Same here: since I didn't get a reply and I can't spot any activity, I'm putting this back as help wanted again, but if you'd like to pick it up again, just let us know, @bigsee!

bigsee commented 4 years ago

hey @shanaqui (and team) - thank you for the nudge and really sorry about this. I've been underwater for a few weeks and started a new job. Just getting to clear emails now and this notification was perfectly timed.

Sorry to mess you folks around. Please consider me out of action for the time being but I'll definitely be back to help out once things settle down... 🙏🏽