cfpb / ccdb5-api

An API that provides an interface to search complaint data.
Creative Commons Zero v1.0 Universal
14 stars 16 forks source link

Sanitize special characters #183

Closed anselmbradford closed 2 years ago

anselmbradford commented 2 years ago

An API request with special characters (curly quotes “) times out: https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?date_received_max=2022-01-04&date_received_min=2019-01-04&field=all&search_term=%E2%80%9Cmortgage%20default%E2%80%9D~3&size=25&sort=created_date_desc

Whereas one with regular quotes is fine: https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?date_received_max=2022-01-04&date_received_min=2019-01-04&field=all&search_term=%22mortgage%20default%22~3&size=25&sort=created_date_desc

The API should handle special characters in the request.

higs4281 commented 2 years ago

This issue isn't so much the handling of special characters as it is a misuse of advanced-search options. The top example uses curly quotes to execute a phrase search with a slop value of ~3, but it doesn't run a phrase search at all. It actually runs a term search with an illegal "fuzzy" Levenshtein distance of ~3 on the second term.

This requires stakeholder decisions on how to mitigate search errors, so I think we can close this issue and let the internal issues play out.