Automattic / jetpack

Security, performance, marketing, and design tools — Jetpack is made by WordPress experts to make WP sites safer and faster, and help you grow your traffic.
https://jetpack.com/
Other
1.59k stars 798 forks source link

Search: Enable synonyms #17957

Open danjjohnson opened 3 years ago

danjjohnson commented 3 years ago

Is your feature request related to a problem? Please describe.

Specific use case:

If a site has posts about a person who then transitions gender and thus changes their name, new posts containing only their new name will not come up if someone searches for their dead name.

Describe the solution you'd like

Enable synonyms so that searches for both names give the same results. https://www.elastic.co/blog/boosting-the-power-of-elasticsearch-with-synonyms

Additional context

3537009-zen

6202-gh-Automattic/jpop-issues

gibrown commented 3 years ago

Hmmm... because of the way we do our indexing we can't really use the built in ES synonyms to solve this problem.

The solution that comes to mind is to provide "answers" to particular search queries: https://github.com/Automattic/jetpack/issues/9174

So in this case the solution would involve the user:

  1. create a CPT post (probably of type jetpack-search-answer)
  2. Set the title is the original name. This is the text we would match against. Maybe we don't require an exact match? Maybe we make that configurable?
  3. The content is "XYZ has changed their name to ABC".
  4. There is a post meta field with a link to the search for that new name (hopefully we can make this link work without a pageload)

In the API we do our normal search, but we also search against the set of answers. We take the top matching answer and put it at the top of all search results. So any search containing "XYZ" will show a box with "XYZ has changed their name to ABC".

Not a perfect solution, maybe there are some ways to improve it.

eeeeevon13 commented 3 years ago

Would this be configurable by the site admin?

gibrown commented 3 years ago

Would this be configurable by the site admin?

Yes that would be the idea. I was more marking this as needing design and assuming @jeffgolenski can take a look btw.

Ipstenu commented 3 years ago

I'm the use-case - lezwatchtv.com :D Synonyms came up because that's the most obvious way I could find in ES to handle that, but I understand it doesn't fit in how you built JP.

I'm actively trying NOT to make a placeholder page, as dead-naming people is emotionally harmful and I'd like to ... not :) (there's a whole thing here, but the tl;dr is that not everyone is okay having "X (born Y)" and we respect that).

My current work-around (with the example of Elliot Page's recent announcements) is to put his dead-names in the (unused for me) Excerpt field. The problem is that means if you look for "Ellen Page" then Elliot is 3rd overall unless you click on the option to show actors only. On the other hand, if you look for "Elliot Page" it's properly first.

I'm using Yoast SEO, which handles redirects automagically for me, so there's already a 'db entry' for "ellen-page" == "elliot-page" -- it would be amazing if that was hooked in so redirects related to they were shared, but that's a much bigger issue :)

With your solution I would have to:

  1. Edit existing page to new name (old-name -> new-name)
  2. NOT redirect it via Yoast
  3. Make old-page again with a 'redirect' message and image (I could reuse the old one)
  4. Write some code 'if page is a redirect, auto-redirect them to the right page'

It certainly can be done, but it's a lot of steps that would be obviated by a field for 'alternate titles.' It would also let people write in things like "Apple iWatch" "Apple Watch" etc for common 'wrong' searches that would not have a negative impact on SEO (like the old keyword stuffing days, eh?).

(Edit to point out: I have multiple editors on that site, and I would need to write a code to automate that whole process to ensure fewer mistakes, as you cannot expect everyone to remember everything)

FWIW I know you're working on having more post_meta values searchable (and I totally get why arbitrary meta would be a terrible idea seeing how much data we all throw at WP). If that was hookable (and prioritizable) now, I would be trying to add a hook for a CMB2 field: alt-names (or dead names) and add them in to searches for exact matches on older/rarer terms.

Edited to add: This would be a kick ass feature for WooCommerce, BTW. If you rename a product or release a new version, you can make it easier for people who search for the old one :)

gibrown commented 3 years ago

Got it, thanks for the explanation @Ipstenu very helpful.

FWIW I know you're working on having more post_meta values searchable (and I totally get why arbitrary meta would be a terrible idea seeing how much data we all throw at WP). If that was hookable (and prioritizable) now, I would be trying to add a hook for a CMB2 field: alt-names (or dead names) and add them in to searches for exact matches on older/rarer terms. Edited to add: This would be a kick ass feature for WooCommerce, BTW. If you rename a product or release a new version, you can make it easier for people who search for the old one :)

For once I may be able to make you happy relatively quickly. I'm actively working on getting our post meta sync cleaned up, documented, tested, etc to address https://github.com/Automattic/jetpack/issues/15672, https://github.com/Automattic/jetpack/issues/8663, and https://github.com/Automattic/jetpack/issues/16904. I'll link to the PR from this thread (hopefully later this week) rather than try to describe it all.

there's already a 'db entry' for "ellen-page" == "elliot-page"

Hmmm... do you know where/how this works? What it saves to? Is it post meta or something else? I see all of these "protected" post meta fields for Yoast that I am not sure what to do with.

Ipstenu commented 3 years ago

Looking at my DB it's in a couple places but the most relevant is this:

wp_yoast_seo_links

Screen Shot 2020-12-07 at 11 26 35 AM

43441 is the post ID: https://lezwatchtv.com/wp-json/wp/v2/actor/43441

There's also a wp_option field: wpseo-premium-redirects-base

a:223:{i:0;a:4:{s:6:"origin";s:4:"show";s:3:"url";s:5:"shows";s:4:"type";i:301;s:6:"format";s:5:"plain";}i:1;a:4:{s:6:"origin";s:9:"character";s:3:"url";s:10:"characters";s:4:"type";i:301;s:6:"format";s:5:"plain";}[...]i:245;a:4:{s:6:"origin";s:16:"actor/ellen-page";s:3:"url";s:17:"actor/elliot-page";s:4:"type";i:301;s:6:"format";s:5:"plain";}}

But that may be a crazy amount long. I have 245 redirects due to renames and the like.