devinit / DIwebsite-redesign

New DI website 2019
1 stars 1 forks source link

Sweep: Modify di_website/publications/mixins.py or di_website/publications/models.py so stories.search(search_filter) only matches whole words #1339

Open akmiller01 opened 1 year ago

akmiller01 commented 1 year ago

Modify di_website/publications/mixins.py or di_website/publications/models.py so stories.search(search_filter) only matches whole words. ElasticSearch is interpreting this line https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/mixins.py such that a search_filter variable here https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/models.py is returning matches for "disbursement" when the search_filter is equal to "disability." The code should be modified such that a search_filter equal to "disability" only returns results for closely related words like "disabilities."

The search is using ElasticSearch, so it's possible the solution may involve passing es_extra dict to the index.SearchField to change an argument like the analyzer.

sweep-ai[bot] commented 1 year ago

Here's the PR! https://github.com/devinit/DIwebsite-redesign/pull/1341.

⚡ Sweep Free Trial: I used GPT-4 to create this ticket. You have 5 GPT-4 tickets left. For more GPT-4 tickets, visit our payment portal.To get Sweep to recreate this ticket, leave a comment prefixed with "sweep:" or edit the issue.


Step 1: 🔍 Code Search

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/devinit/DIwebsite-redesign/blob/d1208ec11024d28d5b968cca54a27b3048df5f20/di_website/publications/mixins.py#L1-L262 https://github.com/devinit/DIwebsite-redesign/blob/d1208ec11024d28d5b968cca54a27b3048df5f20/di_website/publications/models.py#L1-L1174 https://github.com/devinit/DIwebsite-redesign/blob/d1208ec11024d28d5b968cca54a27b3048df5f20/di_website/common/management/commands/importwp.py#L77-L213 https://github.com/devinit/DIwebsite-redesign/blob/d1208ec11024d28d5b968cca54a27b3048df5f20/di_website/common/management/commands/importwp.py#L152-L226 https://github.com/devinit/DIwebsite-redesign/blob/d1208ec11024d28d5b968cca54a27b3048df5f20/di_website/datasection/models.py#L351-L461

I also found the following external resources that might be helpful:

Summaries of links found in the content:

https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/models.py:

The page does not contain any content or code snippets relevant to the problem.

https://github.com/devinit/DIwebsite-redesign/blob/develop/di_website/publications/mixins.py:

The page does not contain any content or code snippets relevant to the problem.


Step 2: 🧐 Snippet Analysis

From looking through the relevant snippets, I decided to make the following modifications:

File Path Proposed Changes
di_website/publications/mixins.py In the PublicationPageSearchMixin class, update the index.SearchField to use an analyzer that tokenizes the text into individual terms. This can be done by passing an es_extra dict to the index.SearchField with the analyzer key set to the name of the new analyzer.

Step 3: 📝 Planning

I have created a plan for writing the pull request. I am now working my plan and coding the required changes to address this issue. Here is the planned pull request:

Fix search to only match whole words sweep/fix-search-whole-words

Description

This PR addresses the issue of the search functionality in the PublicationPageSearchMixin class in di_website/publications/mixins.py matching partial words. The current implementation is using the default analyzer, which results in undesired matches. To fix this, the code has been modified to use a new analyzer that tokenizes the text into individual terms, each term corresponding to a word. This ensures that the search only returns results that match the whole words.

Summary of Changes

  • Updated the index.SearchField in the PublicationPageSearchMixin class in di_website/publications/mixins.py to use a new analyzer.
  • Added an es_extra dict to the index.SearchField with the analyzer key set to the name of the new analyzer.
  • The new analyzer tokenizes the text into individual terms, each term corresponding to a word, ensuring that the search only matches whole words.

Please review and merge this PR. Thank you!


Step 4: ⌨️ Coding

I have finished coding the issue. I am now reviewing it for completeness.


Step 5: 🔁 Code Review

Success! 🚀


To recreate the pull request, leave a comment prefixed with "sweep:" or edit the issue. Join Our Discord

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.