EmbraceLife / shendusuipian

To know stats by heart
287 stars 70 forks source link

如何理解`pat = r'/([^/]+)_\d+.jpg'`? #62

Open EmbraceLife opened 5 years ago

EmbraceLife commented 5 years ago

Notes from @PoonamV on forum

Regular expressions

we can use a regular expression by importing regular expression ‘re’ package in python, to do this. Regular expressions are a way to search a string in text using pattern matching methods.

pat = r'/([^/]+)_\d+.jpg

Let’s deconstruct this regex pattern, /([^/]+)_\d+.jpg$ by reading it backward:

Expression Explanation
$ end of search
.jpg last chars to be found in the search string, also right file format checking
\d numerical digits, ‘+’ sign denotes can be one or more of them
_ should come before the start of digits
() denotes a group of characters
[] denotes another subgroup if characters in the previous group
^/ ‘^’ denotes negation, so ‘+’ says all other chars except ‘/’
( [ ^/ ] + ) searches all characters except ‘/’
/ first ‘/’ in regex says, end of search
r The string should be a raw string. Otherwise, \d would have to be written as \\d so that Python doesn’t interpret it to be a special character.

So, this regex pattern will give us a string

Abyssinian_1.jpg

considering search string was

PosixPath('images/Abyssinian_1.jpg')

Further, by using the fact that the actual name of the breed is in the first parenthesized group of the regular expression, the actual breed name

Abyssinian

can be obtained by using ‘.group(1)’ wherever the search on the regular expression is performed.See this for details.

credits @dreambeats

See this blog post to understand this regular expression in detail. The python documentation has a tutorial on regular expressions. RegexOne provides a simple interactive intro to regular expressions.

EmbraceLife commented 5 years ago

互动可视化的为你解读 r'/([^/]+)_\d+.jpg, 请前往regexr.com/48vci


人工解读

pat = r'/([^/]+)_\d+.jpg$'

因此,我们可以顺利从“folder1/folder2/mycats/Abyssinian_1.jpg”中,解读出“Abyssinian_1.jpg”。但我们要的是“Abyssinian",怎么办?

() 分组能力,让我们用.group(1)顺利从“Abyssinian_1.jpg”中提取出“Abyssinian"。