LibraryCarpentry / lc-data-intro

Library Carpentry: Introduction to Working with Data (Regular Expressions)
https://librarycarpentry.org/lc-data-intro/
Other
29 stars 84 forks source link

improve introductory text to regex #187

Open sharilaster opened 3 years ago

sharilaster commented 3 years ago

In #184, @lyndamk suggests revising several introductory paragraphs to make them easier for a novice to digest. I agree this is a good idea! The PR is a great start, and just needs some expert eyes to suggest a regex example that works for the given context.

mclenard commented 2 years ago

I think the phone number example could work if the \d metacharacter is briefly introduced and you restrict the context a little. For example, if you know ahead of time that a phone number you're trying to match is definitely in (xxx) xxx-xxxx format, you could demonstrate matching this with the regex pattern \(\d\d\d\) \d\d\d-\d\d\d\d

I think this would work best if accompanied by a diagram showing the distinction within this pattern between literal characters and metacharacters, something like the following:

phoneregex2

This only involves literal characters, one metacharacter, and escaping. You can state that this will only match numbers in the above format, not xxx xxx xxxx, (xxx) xxx xxxx, xxxxxxxxxxx, etc. to express the power of regular expressions to be very selective in matching. It could also be useful to note that, later on, learners will see that regexes are sufficiently expressive to capture many different phone number formats while excluding non-phone-number strings of digits.

If folks wanted to use a different example, I think an example of similar complexity (mix of literals, one or two metacharacters, and optionally escaping) is best, along with a visual that clarifies what the metacharacters are "doing" in the regex.

sharilaster commented 2 years ago

@mclenard I really like this proposed solution! Before we move forward on adding this, do you have ideas or resources that could help create a version of your proposed diagram that meets accessibility criteria so it is still useful to folks who use a screen reader or other assistive technology?

mclenard commented 2 years ago

Ahh, good question. I'm not so versed in that, but that is an important consideration I forgot to address. As a very basic nod toward accessibility, I think you could try and express it in alt-text, maybe something like "The pattern is left parenthesis, three \d metacharacters, right parenthesis, one space, three \d metacharacters, hyphen, four \d metacharacters. The parentheses, space, and hyphen are 'literal' characters that must be matched exactly; the \d metacharacters allow for any digits to be matched" I can look into other ways of creating visuals that are more accessible, but right at the moment, I'm not sure. I'd welcome suggestions, though.