Open santhoshtr opened 1 year ago
Thanks for your commits, it worked!
Thanks, it worked!
@attardi As python 3.12 is the latest version, people might want to use 3.11 or 3.12. Gentle ping to merge and release new version. Thanks!
Does the regular expression do what it was supposed to do? I think the proposed change does not change the behavior, so it would be correct, but I am not sure if the regular expression was originally supposed to do something different.
If I understand the documentation for (?aiLmsux)
correctly, then (?i)
changes the whole regular expression to case-insensitivity no matter where it is placed in the regular expression. Probably to remove this confusing behavior, it's only allowed at the beginning of the regex as of Python 3.11.
Since in this code's regular expression it is listed in a sub group, it looks to me like the original author might have (wrongly) tried to make only this sub group case insensitive. If that's the case I think (?aiLmsux-imsx:...)
might be the correct version (available since version 3.6).
The letters set or remove the corresponding flags [...] for the part of the expression.
If indeed case-insensitivity for the whole regular expression is desired, I'd propose to rather use the compile flag re.I
. Both regular expressions already use other compile flags, so adding the global case-insensitivity flag alongside them seems more readable. The documentation even states that (?i)
is basically an alternative for re.I
:
set the corresponding flags: [...] for the entire regular expression. This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.
ExtLinkBracketedRegex = re.compile(
'\[((' + '|'.join(wgUrlProtocols) + ')' + EXT_LINK_URL_CLASS + r'+)\s*([^\]\x00-\x08\x0a-\x1F]*?)\]',
re.S | re.U | re.I)
EXT_IMAGE_REGEX = re.compile(
r"""^(http://|https://)([^][<>"\x00-\x20\x7F\s]+)
/([A-Za-z0-9_.,~%\-+&;#*?!=()@\x80-\xFF]+)\.(gif|png|jpg|jpeg)$""",
re.X | re.S | re.U | re.I)
Update: Looking at both regular expression, I'd say the global version seems right. Even if the original code looks like the author intended to only make parts of the expression case-insensitive.
Let's assume the original author indeed only wanted to make parts of the expressions case-insensitive. Then in ExtLinkBracketedRegex
the URL protocols (bitcoin://
, ftp://
, and so on) would be set to case-insensitive. But in EXT_IMAGE_REGEX
only the file extension (gif
, png
, and so on) would be set to case-insensitive, but the URL protocol (http://
, https://
) would be left case-sensitive. That would be quite inconsistent.
Thanks for your commits, it worked!
Python 3.11 throws the error: re.error: global flags not at the start of the expression at position
Ref: https://bugs.python.org/issue47066