<li.*? data-src="(.*?)".*?> # match '<li [other attrs] data-src="url" [other attrs]>' and store the URL
\s*<figure.*?>.*?(?:<figcaption # match the <figure><figcaption> tags
.*?<div class="caption">(.*?)</div> # match the caption div and store the text inside it
.*?</figcaption>)?\s*</figure>\s*</li> # match all the closing tags to reduce false positives
Notes:
The ? after the patterns signifies a non-greedy match so the regex will attempt to match as little text as possible.
The (?:<figcaption>[other stuff]</figcaption>)? part is an optional, non-capturing group. This means that if there is no caption the regex at least still matches the image url. Being non-capturing just means that it won't be made available in the replace phase.
Rule Submission
Website: arstechnica.com
The regex is a bit ugly but does the job. Here's what an image gallery looks like in HTML (cleaned up a bit, placeholder text in
[]
's)The regex aims to pull out
[fullsize image url]
and[some caption]
and convert them into the following format:The regex explained:
Notes:
?
after the patterns signifies a non-greedy match so the regex will attempt to match as little text as possible.(?:<figcaption>[other stuff]</figcaption>)?
part is an optional, non-capturing group. This means that if there is no caption the regex at least still matches the image url. Being non-capturing just means that it won't be made available in the replace phase.