LibraryCarpentry / lc-data-intro

Library Carpentry: Introduction to Working with Data (Regular Expressions)
https://librarycarpentry.org/lc-data-intro/
Other
29 stars 84 forks source link

Update 01-regular-expressions.md #184

Closed lyndamk closed 1 year ago

lyndamk commented 3 years ago

The two paragraphs below might be hard for a novice to digest. My suggestions:

Regular expressions rely on the use of literal characters (example) and metacharacters (example). A metacharacter is any American Standard Code for Information Interchange (ASCII) character that has a special meaning. By using metacharacters and possibly literal characters, you can construct a regex for finding strings or files that match a pattern rather than a specific string. For example, say your organization wants to change the way they display telephone numbers on their website by removing the parentheses around the area code. Rather than search for each specific phone number (that could take forever and be prone to error) or searching for every open parenthesis character (could also take forever and return many false-positives), you could search for the pattern of a phone number.

Since regular expressions defines some ASCII characters as "metacharacters" that have more than their literal meaning, it is also important to be able to "escape" these metacharacters to use them for their normal, literal meaning. For example, the period . means "match any character", but if you want to match a period then you will need to use a \ in front of it to signal to the regular expression processor that you want to use the period as a plain old period and not a metacharacter. That notation is called "escaping" the special character. The concept of "escaping" special characters is shared across a variety of computational settings, including markdown and Hypertext Markup Language (HTML).

yoyology commented 2 years ago

I also feel that the initial paragraph of the lesson is difficult to understand. As it stands, the text reads

Regular expressions are a concept and an implementation used in many different programming environments for sophisticated pattern matching. They are an incredibly powerful tool that can amplify your capacity to find, manage, and transform data and files.

A regular expression, often abbreviated to regex, is a method of using a sequence of characters to define a search to match strings, i.e. “find and replace”-like operations. In computation, a ‘string’ is a contiguous sequence of symbols or values. For example, a word, a date, a set of numbers (e.g., a phone number), or an alphanumeric value (e.g., an identifier). A string could be any length, ranging from empty (zero characters) to one that spans many lines of text (including line break characters). The terms ‘string’ and ‘line’ are sometimes used interchangeably, even when they are not strictly the same thing.

I would recommend the following:

Many different programming environments require a way to match patterns of characters to do things like ensuring that an e-mail address is properly entered into an online form. A common tool for this purpose is regular expressions. Using regular expressions (or regex for short) allows you to amplify your capacity to find, manage, and transform data and files.

A regular expression is a method of using a sequence of characters to define a search to match strings, i.e. “find and replace”-like operations. In computation, a ‘string’ is a contiguous sequence of symbols or values. For example, a word, a date, a set of numbers (e.g., a phone number), or an alphanumeric value (e.g., an identifier). A string could be any length, ranging from empty (zero characters) to one that spans many lines of text (including line break characters). The terms ‘string’ and ‘line’ are sometimes used interchangeably, even when they are not strictly the same thing.

The only change to the second paragraph is to remove the reference to abbreviation, since I've moved that to the first paragraph.

sharilaster commented 1 year ago

Thank you @lyndamk and @yoyology for the excellent suggestions -- and my apologies it's taken so long to address them. I've removed the placeholder for an example, and will confirm this is open in an issue. And, the suggested revisions to the lesson introduction are now open as a separate issue (#207) so it should be fairly straightforward to create a new PR with the updated language, once the migration to the Carpentries workbench is complete.

sharilaster commented 1 year ago

Confirmed -- the need for an example is open in #187.