Open macobo opened 3 years ago
Sentry issue: POSTHOG-2X8
Sentry issue: POSTHOG-2X7
Sentry issue: POSTHOG-2V7
ClickHouse and the plugin server use Google's RE2 for regex matching. It does have a Node bindings package, but it doesn't work in the browser, as in the end it's a C++ binary. We could use JS standard RegExp
client-side instead which would largely be OK, though it's actually somewhat more lenient than RE2, so some 500s would still slip in.
Alternatively this can be checked server-side when setting a regex field in Python, using google-re2
, in which case validation would require a request, but would be fully reliable.
Postgres is a slightly different story – its regex syntax (POSIX) differs in a few places, though not greatly. In its case only a Postgres query would really reliably verify a regex. However I'd argue we should stick to RE2 validation in this case too.
I think (at least) starting with some js-RegExp validation is a valid start, always can add complexity later if a lot of errors still.
As this seems to have caused confusion a bit of times and affecting funnel usage, adding to the priorities list for this sprint. We should consider the extra context from #4326 (i.e. let's add detailed instructions about the kind of operators that are supported).
Let's focus this issue on addressing clearer instructions around regex. Secondary to that, we could use a simple error pattern for input fields to highlight errors and communicate how to recover from them. I filed https://github.com/PostHog/posthog/issues/5380 to track this work.
I am working on this today and tomorrow. Do you have any guidance on what makes for a great regex authoring experience? So far I am tracking simple validation based on syntax. Would we also need some way to see how the expression evaluates to ensure that it's accurate AND well-formed? Secondary to this, I was planning on adding a simple syntax key or cheatsheat to show what properties we support and possibly even provide shortcuts for applying them into the input field that contains the regex. Any thoughts on what else we can address here? Thanks!
I sometimes use https://regex101.com/ because it visually shows the match groups and makes it easy to build and parse a regex string.
Here's the syntax we use in CH & plugin server. I would keep the cheatsheet to the most basic expressions and then link their docs.
Whenever we evaluate an expression I'm not sure how detailed errors we'd be able to provide on malformed expressions, but I would just show an error string.
Tagging @Twixes for any additional input
Here are some ideas that are mostly intended to facilitate discussion, not so much prescribe a solution. Here's my thinking so far (please correct me if I'm wrong):
My biggest question is how likely are users to author complex regex patterns in the app? It seems like there are a lot of popular single serving sites that are really good at this now. We could try to approach that level of experience with our regex editor, or we could just make it easy to copy and paste text into those fields, and alert users of malformed values. Secondary to that, are there any common regex patterns we think users will want to use? Are there any kind of utility patterns that we could make available in the UI? The idea here is to make writing a regex easier by avoiding writing the regex entirely.
Some ideas for inline options. Validation would occur on blur / de-focus, or when the user clicks "reevaluate". It was about this time I realized a textarea might be a better experience.
Our ability to summarize a complex regex is also a bit of a challenge. A summary view and a modular editing view might be worth considering as well. In this case, once a user selects the matches regex
operator, when the value field is focused, we could trigger a modal containing a large-form editor. This would allow for both simple and complex patterns.
Syntax highlighting would be great, and we even already use a library in the product that can do this (react-syntax-highlighter
which uses Prism.js under the hood). BUT unfortunately it doesn't work with input
s or textarea
s. They don't allow such styling so we'd have to engineer our own component that would work exactly like input
– only by displaying text using raw HTML (a problem I also recently faced in https://github.com/desktop/desktop/pull/12465 🙈 ). I'd skip that for now as that would be a lot of effort for one component.
As for error messages, I definitely think they could be more useful by being specific. In fact I just made them such in #5456, which I think looks pretty fine.
I think validation should be automatic (and non-obtrusive, deblur/debounce should be good), and your mocks look really nice @clarkus. Much more coherent than the current error state message thrown in quickly.
I would actually argue the option to edit a filter value in a textarea
modal could be generalized to all operators in a useful way (though a bit concerned about this UI for operators where a filter can have multiple values).
Personally for the really complex expressions I use https://regexr.com/ and they have a nice cheatsheet. We definitely can't match the power of that app… but we could put a similar cheatsheet in our one.
Awesome feedback. Thank you, @Twixes.
Could we do some input fakery with a content-editable
attribute on a div or some other element? That would probably get us closer to allowing structure inside that container element and hopefully then some syntax highlighting. As long as we could pull the string out of the element, it should work?
We could do some fakery but then the input experience would be impacted (i.e. cursor moving, selection, etc.). That's why I'd rather go for the less tricky improvements first.
I did some quick searching and found what might be a viable option that uses multiple elements to emulate an editor with syntax highlighting. https://css-tricks.com/creating-an-editable-textarea-that-supports-syntax-highlighted-code/
I'm not going to pursue that right now, but wanted to share it before I forgot. Maybe we can aim for something like this in the future. I'm going to look into incorporating modular editors for other operators as well. Thanks again for the input!
I thought of another question How critical is it to have a sample string or representation of the property value being affected by the regular expression? Seems like having some sample might aid in writing a regex. I'm just now sure how representative the sample could be in this context.
Couple more iterations that focus on the modular editor. I've moved validation to a standard position and made the reevaluate action consistent. The syntax reference is also placed opposite the label so that it doesn't conflict with a more verbose validation message. I've also used some subtle code formatting within the validation string to highlight the portion that represents the invalid characters.
Available in figma at https://www.figma.com/file/9yWtngNb1AIuf6KmXaEPJA/App-doodles?node-id=923%3A134375
That last design looks amazing! @Twixes correct me if I'm wrong but I think we can evaluate on the go? Also by syntax highlighting you mean highlighting each match group so you can better understand how the Regexp is constructed?
We can definitely evaluate on the go, though without some debouncing it would be annoying. Yup, syntax highlighting would help see match groups, but also even more importantly I think, special characters.
I don't think we should validate on every key press. That could be really distracting when you're just trying to type in the expression you know you want.
@macobo's idea from https://github.com/PostHog/posthog/issues/5380#issuecomment-892408580 could be worth considering?
User has not typed for the past N ms
Yup, that's debouncing, 500 ms is my go-to value in such cases (but can always be tweaked).
Thanks, @Twixes. You taught me something new.
Let's move ahead with the modular regex editor. We should trigger the modal when the user selects one of the regex operators from the list, or if the modal is closed and a regex operator is already selected, when the value field is selected. Once the regex is validated, on submit we will insert that regex value into the value field in the filter row. Said another way, the modal is replacing any kind of dropdown or other interstitial selection component.
I think conditional editing modes is an interesting idea. I can work on some iterations for that. The reevaluate button was really meant to let a user test their regex without having to go through the whole debounce / blur to trigger validation. Some manual means of triggering that check could help users trust that they're entering a meaningful value here. An in-product popover could be helpful. Do we have that content or can we produce it to test this idea? It's really a question of how complex that cheat sheet could be.
The core change here is validation messaging. If we can ship that, then we could watch how users react and adjust our next iterations based on that. I'm not sure the other pieces are necessary to really improve the experience... I mean they would, but if users are authoring regex on external tools and copy/pasting into the control, then the cheat sheet and manual validation controls aren't as necessary.
Here's an inline version of the regex editor. This removes the validation button. I am still working through how we might trigger a modular version of the editor.
https://www.figma.com/file/9yWtngNb1AIuf6KmXaEPJA/App-doodles?node-id=1076%3A135967
+1 based on ZEN-3909 (in particular we only return a server error when you try to use lookahead or lookbehind, which isn't supported by re2)
Is your feature request related to a problem?
Writing a correct regex is hard. Currently when user manages to write an invalid regex, they get an 5xx error back from our API.
Describe the solution you'd like
Let's instead show them a warning on the frontend?
Describe alternatives you've considered
Additional context
Thank you for your feature request – we love each and every one!