Closed wibeasley closed 11 months ago
Looks like this works for all the scenarios except pipes in the label. For that to work it seems like you'd need to first split on the pipe, then check if each entry looks like a value,label pair (without any trimming). If it doesn't look like a pair its part of the label for the preceding option so add it back on.
Of course, throw in commas and pipes into a label and there's no hope.
@BlairCooper, maybe a hybrid of the two approaches should replace #500. (These attempts are still in dev branches --nothing has been pulled into the main branch yet.)
Maybe we still use a two-stage approach, where
sptrsplit()
uses pipes to separate the choices/rows. Then use our own regex to separateid
fromlabel
.I gave up on
readr:read_csv()
for this purpose because I couldn't get it working if the label contained commas. I'd have to preprocess the entries to enclose the label in quotes forread_csv()
to operate as intended --but we'd need a regex to do that. So we might as well use a single regex to separate them into columns and avoidread_csv()
. Tell me if anyone disagrees or sees something I'm overlooking.@BlairCooper, here's the heart of the function's current version (still on a dev branch). I tried to avoid using
base::trimws()
and had a working regex (ie,^\\s*+(.+?),\\s*+(.*?)\\s*$
). But I couldn't figure out how to drop empty rows, like your scenario in #502.You mentioned that this approach is 0.5sec vs 0.3sec. I think I'm ok with that because moving data across the network always dwarfs the computation time on the client machine.
Are there any scenarios this approach can't handle correctly? The only outstanding case is if the label has pipes (#503). In theory the
strsplit()
could use a regex to distinguish pipes between choices versus pipes within labels. I haven't figured out how though, because theid
is so flexible (eg, letter(s) & number(s)).@BlairCooper or anyone else, any thoughts on performance, accuracy, or maintainability?