This commit documents the gnarly URL_REGEX class constant. Ruby's
regex literal free-spacing mode (using the x modifier) is used to
accomplish this along with single-line comments.
The one and only change to the regular expression itself is in the final
part, where a double quote (") character is added to the character
class that causes the regular expression to stop matching. This is done
because I have observed that many Google Calendar descriptions contain
HTML descriptions.
When an HTML link is defined in an iCalendar description property (as is
common for Google Calendar event authors, given GCal's rich text editing
mode), the description will contain a URL such as:
<a href="https://example.com/">Something.</a>
When a description like the above exists, AND no RFC 5545 URL property
exists in the input feed, then the description_urls method of this
file will incorrectly determine that the first URL is:
https://example.com/">Something.</a>
If this variable is then used in a Liquid template as, for example:
<a href="{{ event.url }}">{{ event.summary }}</a>
the resulting HTML will actually be the unexpected value:
A better solution to what is presented here would be more explicit URI
parsing, but given that the most popular iCalendar generators always use
a double quote character for enclosing HTML attributes, this should be
safe and is simple enough for now.
This commit documents the gnarly
URL_REGEX
class constant. Ruby's regex literal free-spacing mode (using thex
modifier) is used to accomplish this along with single-line comments.The one and only change to the regular expression itself is in the final part, where a double quote (
"
) character is added to the character class that causes the regular expression to stop matching. This is done because I have observed that many Google Calendar descriptions contain HTML descriptions.When an HTML link is defined in an iCalendar description property (as is common for Google Calendar event authors, given GCal's rich text editing mode), the description will contain a URL such as:
When a description like the above exists, AND no RFC 5545
URL
property exists in the input feed, then thedescription_urls
method of this file will incorrectly determine that the first URL is:If this variable is then used in a Liquid template as, for example:
the resulting HTML will actually be the unexpected value:
A better solution to what is presented here would be more explicit URI parsing, but given that the most popular iCalendar generators always use a double quote character for enclosing HTML attributes, this should be safe and is simple enough for now.