codidact / qpixel

Q&A-based community knowledge-sharing software
https://codidact.com
GNU Affero General Public License v3.0
382 stars 69 forks source link

Exact search on a post title doesn't find the post #387

Open cellio opened 3 years ago

cellio commented 3 years ago

https://meta.codidact.com/posts/279392

I reproduced the problem by searching for the linked post by its title in quotes. No hits. When I removed the quotes I got a bunch of hits, but this post wasn't near the top.

I wondered if stop words were interfering, so on Meta Blog I searched for "Newsletter #1 (May 2020)" and didn't get any hits either. I also tried "Resource section" on Judaism Meta. This rules out stop words as culprits.

Update: per https://meta.codidact.com/posts/287032, search doesn't look in titles at all.

BoxedFruits commented 3 years ago

I'd like to work on this!

Also, I'm going to post my research here because I am not too familiar with Ruby on Rails yet and my SQL is rusty so I would appreciate any advice/help.

I looked in the logs and it looks like its only matching using body_markdown when doing a search:

 SELECT  `posts`.*, MATCH (`posts`.`body_markdown`) AGAINST ('searchTerm' IN BOOLEAN MODE) AS search_score FROM `posts` WHERE `posts`.`community_id` = 1 AND `posts`.`post_type_id` IN (1, 2, 5) AND (MATCH (`posts`.`body_markdown`) AGAINST ('searchTerm' IN BOOLEAN MODE)) ORDER BY `search_score` DESC LIMIT 25 OFFSET 0

We need to match using both the body_markdown and title so in MySQL I executed this command. I don't know the implications of doing this to the prod DB.

ALTER TABLE posts ADD FULLTEXT (title,body_markdown);

The function that parses the search term is in posts.rb

  def self.search(term)
    match_search term, posts: :body_markdown
  end

In order to match with both body_markdown and title we change the third line to:

match_search term, posts: [:body_markdown, :title]

Exact matches for title and body_markdown now work... but not non-exact matches and I am not too sure why.

cellio commented 3 years ago

Thanks for looking into this!

Why is the additional column needed in the table if you can modify match_search to look in both the body and the title? I'm confused by the DB change.

manassehkatz commented 3 years ago

Thanks for looking into this!

Why is the additional column needed in the table if you can modify match_search to look in both the body and the title? I'm confused by the DB change.

The DB change is to make search speed reasonable by indexing the fields. However, it should be fine to index each field independently, and I think that would be better for a number of reasons. Do current indexes include title, body_markdown, neither one or both?

cellio commented 3 years ago

I'll have to defer to @ArtOfCode- or @luap42 for that; I don't know how the table is indexed now.

ArtOfCode- commented 3 years ago

@BoxedFruits Current index is just on body_markdown. The only major concern about adding a FT index on title is the disk space, but a little bit of analysis says it's likely about 1/100th the size of the existing body_markdown index, so... no problem there.

Your proposed approach looks good to me. The SQL command needs to go in a Rails migration - if you've not used them before, run rails g migration AddFulltextIndexToPostsTitle to generate the migration file. It'll add a file in db/migrations/ containing a migration class and a change method; in that method you can add the following, which is equivalent to your SQL:

add_index :posts, :title, type: :fulltext

Side note: are you in our Discord server? I tend to get pings there quicker than I do on GitHub, so you might find it helpful if you have questions while you're working: https://discord.gg/AhRYJg8