codidact / qpixel

Q&A-based community knowledge-sharing software
https://codidact.com
GNU Affero General Public License v3.0
379 stars 69 forks source link

Better searching #833

Open Taeir opened 1 year ago

Taeir commented 1 year ago

Is your feature request related to a problem? Please describe. Currently, search is performed using a database match query. While this is a very fast way of searching, it does not provide great results. As it is currently set up, only exact word matches are found. This can be good or bad:

Additionally, it seems the current solution does not search in titles, even though those usually contain the most important information.

When a user is not able to find their question, they will look elsewhere or ask a new one. This may decrease user interaction with the platform and increase the likelihood of negative interactions (closed because of duplicate). It should not be underestimated how important proper search is.

Describe the solution you'd like Searching effectively is a rather complex problem. Rather than reinvent the wheel, I suggest to use an existing full-text analyzing solution. These search systems have an understanding of languages and can detect word matches when other tenses or even synonyms are used. Additionally, weights can be set to give more weight to matches in the title, etc. These systems can also make suggestions for better search terms (spelling corrections) to help the user further.

Describe alternatives you've considered There are a few different alternatives available:

Elasticsearch Pros:

Mehs:

Cons:

Apache Solr/Lucene Pros:

Cons:

There may be other systems out there, but these two seem to be the most used which have proper rails integration.

Additional context While SolR may be a somewhat better choice for the long run, the current code style lends itself to elasticsearch much better (as far as I can tell). I'm relatively familiar with ElasticSearch, so I will be providing a pull request with an implementation for it. More work will be required to build additional features such as highlighting (indicate the words in the match), search in associated entities (increase search rank of posts where the answers match highly with the search term), and if you want to go that direction another consideration should be made which of the two to use.

For smaller sites (not much data or few searches), running a single elasticsearch instance (i.e. very similarly to running a database locally), is fine. However, to deploy something like this at a larger scale will require some proper testing and configuration. It requires setting up an Elasticsearch cluster, with probably multiple nodes to get performant search results. I don't know at which scale Codidact currently is, but it would be something to consider for the future. I'm pretty sure AWS has configurations/tutorials for setting up both Elasticsearch and Solr clusters.

cellio commented 1 year ago

Another report about search in titles: https://meta.codidact.com/posts/287032