Closed synrg closed 4 years ago
In inatcog/converters.py, we use this "cheat". To handle the transform of terms plus phrases (a list produced by shlex.split
with posix=false
so double-quotes are kept) to just terms:
join
them all with blanksshlex.split
the argument with posix=true
(the default) to unwrap the phrases from their double quotesUnforunately, the cheat causes the bug. Unbalanced single-quotes aren't Posix-compliant. Since we aren't actually processing a Posix-compliant commandline, we shouldn't be cutting corners like this!
A correct implementation would be to map the list of terms & phrases, dropping leading/trailing double-quotes.
But even posix=false
doesn't save us from potential issues here. If you have a query string with a single unmatched double-quote in it, that will break, too. Arguably, though, that is what we would like to happen (though the raised exception needs to be caught and handled, rather than just dying and throwing errors into the log).
Now that I think of it, this sounds suspiciously like the bug on aliases in Red core itself that caused us grief when a single unmatched curly-apostrophe appears in an aliased command's arguments. I wonder if shlex is at the root of that issue, too? This bears further investigation, as whatever we come up with here might help solve that issue, too.
Looking at the shlex doc, in addition to always using posix=false, instead of using split (which creates an instance) we could create an instance manually, then set escape="" on it (i.e. no escape) and finally, set quotes to only include the double-quote, instead of also single-quotes. After that, I guess we could catch the unbalanced quotes issue and retry the parse with a double-quote added to the end of the string if that corner case is encountered (i.e. automatically balance the double-quotes, making the string into a phrase).
If that solves the apostrophe issue, then we can continue to use the posix=true hack to unwrap the quotes. Otherwise, the map and substitute approach on each phrase should work, though it's not quite as elegant.
Fixed in 222c64169607ed951d139d2b87f9d9a650fb328b. I went a different way with it, though.
The new parser has issues with apostrophes in taxon queries. Mason triggered this on iNat Discord with a search of
,taxon elephant's head
. The workaround for now is to use double-quotes, e.g.,taxon "elephant's head"
but the user should not have to type this. In the log I see: