department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
282 stars 203 forks source link

Examine and resolve weird search behavior - learning center 1.1 #15650

Closed jenniferlee-dsva closed 3 years ago

jenniferlee-dsva commented 3 years ago

Not sure what type of logic we're using...

Example: When I clicked in the search bar to start typing in my search, the term "office of inspector general" came up as a typeahead dropdown. I clicked on it to test it. (Expecting 0 LC results.)

http://learningcenter.web.demo.ci.cms.va.gov/resources/search/?query=Office+of+Inspector+General

Instead 20 learning center articles came up in search. But none of these articles are related to or relevant for the OIG. I suspect that the search is bringing back any article with the phrase "office of..." in the body content? Or even pulling from the global footer which as the Office of Inspector General link?

Couple of issues: 1/ We definitely don't want too many results if all of them are irrelevant. Better to bring back an accurate 0 results with messaging already in place. 2/ It's odd that even before I started typing, the typeahead dropdown showed terms to choose from. Seems like it should do so only AFTER I start typing my term, and only show choices that match the letters --> phrase I am progressively entering.

image.png

ncksllvn commented 3 years ago

All of those search results have VA account and profile as their primary category, which contains the word of inside of profile. Obviously this isn't ideal. Here are some suggestions for resolving this -

  1. Omit the primary category from search criteria
    • There may not be a ton of value to this anyway, but without it, it seems like a query for profile won't pull up an article like DS Logon FAQs
  2. Split the article titles into lists of keywords
    • For example, Connecting third-party apps to your VA.gov profile becomes seven separate keywords. A query for prof would pull up this article as a result because the last keyword, profile, starts with prof. However, a query for of would not pull up this article, because none of the keywords starts with of.
ncksllvn commented 3 years ago

Not sure if I should leave this in the backlog or pick it up

jenniferlee-dsva commented 3 years ago

Hi @ncksllvn - I'd like to make sure we're on track with accessibility things and bugs before launch first. I would say let's look at this along with other search things after we've resolved/validated 508 and critical bugs first.

RE:

All of those search results have VA account and profile as their primary category, which contains the word ofinside of profile.

Whoa! :-) Agree @ncksllvn We definitely don't want that kind of behavior which will def bring up a lot of irrelevant results. (This is basically how the forms search acts today.)
However, I would say the source of the above issue is not that the primary category is being included in search but that the search logic is looking for any matching letters (vs. keeping words whole). E.g., bringing back pages with "of" in them because "of" is inside the word "office."

When we look at search tickets a little more comprehensively, I would like to strongly suggest that we consider using and customizing search.gov's api for LC. It doesn't seem like we'll get to a state of baseline functional search behavior just starting from scratch -- @johnhashva ?

johnhashva commented 3 years ago

@jenniferlee-dsva good points. the question to me is: what fundamental capabilities can we implement now (either ahead of this week's MVP launch or next one in Dec) with the custom search we have now to make for a more "as expected" experience ... e.g. prioritization of results that match whole words and especially the phrase. If we switch to the Search.gov API for LC, we know the experience (improved UX, tag filtering) could be lost. Should we chat about these options this week -- or wait until after this week's launch? (@ncksllvn welcome your thoughts here)

ncksllvn commented 3 years ago

Whoa! :-) Agree @ncksllvn We definitely don't want that kind of behavior which will def bring up a lot of irrelevant results. (This is basically how the forms search acts today.)

It's just a snippet of JavaScript in the frontend, so it's not difficult to change. I didn't know we felt that way about the forms search. Its search logic is just two or three lines of Ruby code, so it is also easy to read and edit for any engineer.

When we look at search tickets a little more comprehensively, I would like to strongly suggest that we consider using and customizing search.gov's api for LC. It doesn't seem like we'll get to a state of baseline functional search behavior just starting from scratch -- @johnhashva ?

I am fine with using search.gov as long as we accept the limitations. That is -

ncksllvn commented 3 years ago

search.gov is a great solution for what we do on va.gov/search - indexing the vast VA digital world to compute a list of best results for a single query. But IMO there will always be room and incentive for a specialized, small-scale search. It takes some creative thinking and trial+error to define how the search itself operates, but I think they can make very special user experiences.

ncksllvn commented 3 years ago

I made a few changes just now. These results pull up because "of" is in the article titles and "of" was also in my query. We can create a blocklist to get rid of "of" being used as a keyword.

image

This search operated by -

Take my query and turn into a lowercased set of keywords ->

Then, for each article title, do the same thing ->

Then, compute the intersection of the two lists by checking to see if any of the keywords of the article title start with any of the keywords in the query. "of" in the article keywords starts with "of" of the query keywords, so it pulls up a result.

It's hard to explain this stuff, but the beauty of this is that we have the ability to change it until we're happy with it. We wouldn't have that ability if we used an external search service like search.gov. It would accept a single input (a query) and return an output (a list of results) and we would have no idea what's going on under the hood.

johnhashva commented 3 years ago

@ncksllvn when you say "get rid of of as a keyword -- does that mean it will be ignored as part of the search query if it is typed? (@jenniferlee-dsva With only 25 articles for this MVP and a Beta label, I think we can continue to find and refine for this initial stage -- including the next point release. The question of whether this approach is "scalable" (more content) -- and Search.gov needs to be back on the table -- will need to be revisited in my mind in Q1.)

jenniferlee-dsva commented 3 years ago

@johnhashva RE:

If we switch to the Search.gov API for LC...

I meant to think of this post-1.0 launch. And I meant to suggest looking at if we could LEVERAGE parts of their search alg in a custom LC search -- sorry, not as a wholesale switcheroo.

ncksllvn commented 3 years ago

when you say "get rid of of as a keyword -- does that mean it will be ignored as part of the search query if it is typed?

No it would just mean to omit it from the list of keywords during the search, totally under-the-hood. In my example there, imagine the query itself staying the same but the keywords excluding "of", so it would only search on the keywords ->

The idea being that if someone had "of", "the", "in", etc. in their search query then we could just ignore that as a keyword.

jenniferlee-dsva commented 3 years ago

Regarding articles ("the"/ "an" / "a"/ )and other words - conjunctions, prepositions, what have you - like "of"/ "and"/ "or" / "and"/ "but" and so on - yes, bringing up any content that has these random words is not helpful relevance.

I do suspect that we should be focusing on prioritizing 1/ phrase matching, then next 2/ whole word matching (minus filler words like articles, conjunctions and so on unless it is in the context of a phrase match.)

@ncksllvn - today in Forms search, the search logic mirrors the logic that the legacy VA forms experience was using -- which is literally just bringing back everything that has any of the words in the query, not as a phrase, but as separate independent word searches.

jenniferlee-dsva commented 3 years ago

@ncksllvn I know that to ensure that we were getting all of the claims related articles whether the content had "claim" or "claims," we used the "OR" boolean logic. And in that case it does get us the relevant results.

But this is where I feel like search algorithms are complex and it's not feasible to have very simple blanket logic.

And of course we don't know yet how any of this might be improved - or worsened - if we searched body content too. Something to test carefully post launch.

Unlike real search alg, ours doesn't have machine learning capabilities, so it's also kind of limited in that it's not going to learn from what users click on or don't click on. Does the search.gov api have that? That would be a feature we'd want to 'borrow' if we can. :-)

ncksllvn commented 3 years ago

Unlike real search alg, ours doesn't have machine learning capabilities, so it's also kind of limited in that it's not going to learn from what users click on or don't click on. Does the search.gov api have that? That would be a feature we'd want to 'borrow' if we can. :-)

I'm not sure how the search.gov API works, but it certainly uses a page ranking system of some sort, so we would get that by using search.gov over an in-house version.


For the short term though, would you like me to merge my PR to adjust our current search to what I outlined above?

ncksllvn commented 3 years ago

Adjustments to the search should go out tmw

jenniferlee-dsva commented 3 years ago

Just checked this on live search. "office of inspector general" brings back -- correctly -- 0 RS results with the 'try all va.gov search' message.

However, when I click on the search all of VA.govtext link, it's weird. It seems like it starts to take me to the global search results but then the page seems to refresh or redirect me to the homepage.

jenniferlee-dsva commented 3 years ago

See if you can see this animated GIF @ncksllvn

search-all-vagov-redirecting-to-homepage

jenniferlee-dsva commented 3 years ago

The undesired behavior is still happening - i.e., when i search for that I know we don't have RS articles on, the results are still brining up articles because a part of the word or one of the words are appearing somewhere on the page.

For example:

Since we are including the article body content, I guess we are expanding the scope of the search, and may have expanded things too much, lol. :-) Open to your thoughts but it seems like including the sandwich layers (Related info, VA benefits, Need more help -- and post 1.1, How do you rate your experience -- components) provide little value and add noise to search.

This seems like an iteration rather than the same bug.

image

johnhashva commented 3 years ago

@jenniferlee-dsva to confirm:

Worth noting: The new Search PM -- Denise Coveyduc -- has started, but her focus is VA.gov global on-site search not R&S custom search. Brian and Nick really need to be on-top of this as part of R&S launch iteration, unless/until a decision is made to integrate this with Search.gov.

ncksllvn commented 3 years ago

See if you can see this animated GIF

that's a valid issue i hadn't seen before...I think it is separate from this issue but regardless i'll just pick it up as part of this

The undesired behavior is still happening - i.e., when i search for that I know we don't have RS articles on, the results are still brining up articles because a part of the word or one of the words are appearing somewhere on the page.

Yeah it's just treating any inputted word as a keyword rather than ignoring certain words like "and", "apply", etc. and pulling up matches. Obviously it is a very elementary search function. Unfortunately I do not think we will be able to develop the level of sophistication that you're seeking for this search without a fundamental change in how it operates...and I personally don't think search.gov will provide that kind of intelligent search either. It might, but to me it seems like we are envisioning what a tool like Yext provides. John and I talked with them back in the summer or so.


Edit - short term, we could consider setting up a list of keywords to ignore (of, and, as, is, etc) and then require that search results have all of the remaining keywords in the user's query

ncksllvn commented 3 years ago

Since we are including the article body content, I guess we are expanding the scope of the search

Sorry, I missed this...we aren't searching body content though - only the article title. Should it be searching the body content?

ncksllvn commented 3 years ago

However, when I click on the search all of VA.gov text link, it's weird. It seems like it starts to take me to the global search results but then the page seems to refresh or redirect me to the homepage.

This is the global search app's behavior when it doesn't have a query in the URL. The issue we were seeing was the same as that other issue where we were missing trailing slashes, causing the query to not carry over. I just opened a PR for this, https://github.com/department-of-veterans-affairs/vets-website/pull/14926.

jenniferlee-dsva commented 3 years ago

Replying to your reply to my reply :-)

Sorry, I missed this...we aren't searching body content though - only the article title. Should it be searching the body content?

Hi @ncksllvn - Are you sure? I only assumed we were already searching the body of the article because some of the results for "apply for pension" doesn't include any matches in the title text. (Hence, I assumed they were being brought up bc the word/s are appearing elsewhere in the article.)

ncksllvn commented 3 years ago

@jenniferlee-dsva I see four results but they all have the word "for" in the article titles ->

image

We could create that "ignore list" to make words like "for" no longer be treated as a keyword

jenniferlee-dsva commented 3 years ago

Ohhhh :-)

Would creating the "ignore list" of words affect phrase matches?

Asking because we do want to bring back results that have phrase matches and rank/weight them as more relevant over results that just have single word matches.

(So for example if the word "for" is in the user's inputted phrase search "change of address" - would we lose the ability to bring back phrase matched results and show them above other results?)

jenniferlee-dsva commented 3 years ago

I'm going to deprecate this ticket because there are too many overlapping issues in this thread. I'm still seeing irrelevant results - e.g., 'apply for pension' --> brings back anything with the word "for" in the title.

It looks like in order to really resolve the original weird behavior, we need to create that ignore list.

And then separately create FE logic that looks for phrase matches and ranks/prioritizes matches (or consider Yext or some tool if this level of sophistication isn't achievable with what we currently have).

FYSA ^ @johnhashva