emacs-circe / circe

Circe, a Client for IRC in Emacs
GNU General Public License v3.0
390 stars 51 forks source link

Extend images regex to also match the appendix after file extension #409

Open Thaodan opened 2 years ago

Thaodan commented 2 years ago

Some images have an appendix after the file extension making the builtin regex fail. An example would be something like https://example.com/foo.svg?size=800x600.

wasamasa commented 2 years ago

The original code tries to match several URLs in a buffer, with this change that no longer works:

(let ((haystack "foo http://xxx.jpg <https://bar.jpg?>http://baz.jpg")
      (needle "\\(https?://[^ ]*?\\.\\(?:png\\|jpg\\|jpeg\\|svg\\|gif\\).*\\)"))
  (with-temp-buffer
    (insert haystack)
    (goto-char (point-min))
    (while (re-search-forward needle nil t)
      (message "%S" (match-string-no-properties 1)))))
;; "http://xxx.jpg <https://bar.jpg?>http://baz.jpg"

Generally, .* patterns should be avoided whenever possible for this reason because they're only terminated by a newline. Instead, it should be delimited by URL delimiters (whitespace, double quote, angular brackets). The existing regex could be rewritten using rx if you prefer.

Thaodan commented 2 years ago

The original code tries to match several URLs in a buffer, with this change that no longer works:

(let ((haystack "foo http://xxx.jpg <https://bar.jpg?>http://baz.jpg")
      (needle "\\(https?://[^ ]*?\\.\\(?:png\\|jpg\\|jpeg\\|svg\\|gif\\).*\\)"))
  (with-temp-buffer
    (insert haystack)
    (goto-char (point-min))
    (while (re-search-forward needle nil t)
      (message "%S" (match-string-no-properties 1)))))
;; "http://xxx.jpg <https://bar.jpg?>http://baz.jpg"

Generally, .* patterns should be avoided whenever possible for this reason because they're only terminated by a newline.

That's what I was wondering to but I thought it only works per line so how can white space by an issue.

Instead, it should be delimited by URL delimiters (whitespace, double quote, angular brackets). The existing regex could be rewritten using rx if you prefer.

That sounds like it make sense.

Thaodan commented 2 years ago

Instead, it should be delimited by URL delimiters (whitespace, double quote, angular brackets). The existing regex could be rewritten using rx if you prefer. What is rx in this case?

wasamasa commented 2 years ago

S-expression notation for regular expressions. Try M-x find-library RET rx RET for more information.