Open cmgrulke opened 4 years ago
More relevant than the textsearch: https://www.postgresql.org/docs/current/fuzzystrmatch.html
After loading the fuzzystrmatch
extension with CREATE EXTENSION fuzzystrmatch
, this query works as hoped:
SELECT substances.id AS substances_id,
substances.identifiers AS substances_identifiers,
levenshtein(substances.identifiers ->> 'preferred_name', 'Hydrogen Peroxide') as dist_match,
levenshtein(substances.identifiers ->> 'preferred_name', 'Hydrgoen Peroxide') as dist_close,
levenshtein(substances.identifiers ->> 'preferred_name', 'Foodrygen Baroxide') as dist_bad
FROM substances
WHERE (substances.identifiers ->> 'preferred_name' = 'Hydrogen Peroxide' )
The distance expressions return 0, 2, and 6.
I don't know what the performance cost will be to apply this to every record in the table before the WHERE
clause returns the results for scoring, but I think that's what we have to do.
As a user I want to be able to able to find chemicals even if I misspell the name So that I can be error prone in my work, but still be successful in finding chemicals on ChemReg
GET https://api.chemreg.epa.gov/search?identifer=$$$$$$$
misspelled names like "Caffiene" still return the correct record
Acceptance Criteria