Open EvilDrPurple opened 3 months ago
this is good to know about but tricky to fix. in v2, quick search (and searching on strings in general) will have greatly reduced impact on the results people are able to get. So that's good, and minimizes the impact of details like this.
Since only one (the longest one) is kept, this means the second one is discarded in favor of the first.
Can you say what you mean by this?
Can you say what you mean by this?
@josh-chamberlain In the code currently, we perform a search of the dataset twice: the first time using the search terms exactly as written, and the second after "depluralizing" the search terms. This means two different sets of results are returned, only the largest one is selected for displaying while the smaller result is discarded. Hope that clears it up a bit
@EvilDrPurple oh, I see—I was overthinking it. Yeah, we should probably combine the results and show them all.
Context
Multi-word searches that pluralize words other than the last word in the search will sometimes cause competing results between the unaltered search and the de-pluralized search, meaning some results will not be displayed. The de-pluralized search attempts to search a second time where the words are made singular to try and find more results. For example, searching
uses of force
inmadison
returns one result:This looks good at first glance, however the backend has actually found two results:
[('reckSg7rw3raeGvP2', 'Archived 21st Century Policing Quarterly Data', 'Summarized data about incident-based reporting, arrests, personnel demographics, traffic stops, and uses of force.\n', 'Annual & Monthly Reports', 'https://www.cityofmadison.com/police/data/archived-quarterly-data.cfm', '["PDF: Machine Created", "XLS"]', datetime.date(2016, 1, 1), None, True, 'Madison Police Department - WI', 'Madison', 'WI')]
[('recL8nSiM0HsIOaGN', 'Use of Force Policy', None, 'Policies & Contracts', 'https://public.powerdms.com/HSVPS/tree/documents/40', None, None, None, True, 'Huntsville Police Department - AL', 'Huntsville', 'AL')]
The first one is found by the unaltered search, while the second is found by the de-pluralized search. Since only one (the longest one) is kept, this means the second one is discarded in favor of the first. (The reason the second one comes up is that Huntsville is located in Madison County) This is a smaller scale example for what may be happening in some other, larger searches, we can probably easily combine the two lists coming from the backend and remove duplicates instead of discarding oneRequirements
Open questions
use of force
inmadison
will only return the Huntville Alabama result and not the Madison Wisconsin result, since the keyworduses of force
is used for the Wisconsin data source. Maybe we should implement pluralizing similar to how we use de-pluralizing? Though this may be a challenge for multi-word searches