Preservation of Stopwords in Indexing and Search: Stopwords are now preserved when indexing and searching. This change ensures that searches containing stopwords may return different, yet more accurate and contextually relevant results.
Previous Configuration: In earlier versions, Concourse Server could be configured using the conf/stopwords.txt file to exclude common stopwords from indexing and search operations. This approach was designed to reduce storage requirements and improve search performance by removing frequently occurring, but generally less significant words.
Rationale for Change: Preserving stopwords is crucial for maintaining the contextual integrity of queries, which can significantly enhance the accuracy of search results and the effectiveness of ranking algorithms. Advances in storage and computational technologies have reduced the impact of including stopwords in terms of resource usage, making the trade-off for better search accuracy and system robustness more favorable. Lastly, Removing stopwords was found to introduce corner case bugs within Concourse's buffered storage system, leading to operational challenges. Preserving stopwords simplifies the architecture and enhances system stability.
Upgrade Implications: Upon upgrading to this version, an automatic reindexing task will be initiated to ensure that all previously indexed data conforms to the new no-stopword-removal policy. Users should plan for increased storage needs due to the inclusion of stopwords in the index.
conf/stopwords.txt
file to exclude common stopwords from indexing and search operations. This approach was designed to reduce storage requirements and improve search performance by removing frequently occurring, but generally less significant words.