Rakefire / jekyll-ical-tag

Pull ICS feed and provide a for-like loop of calendar events in jekyll's liquid tag
7 stars 6 forks source link

Clarify `URL_REGEX` using free-spacing mode, stop on double quote. #21

Closed fabacab closed 4 years ago

fabacab commented 4 years ago

This commit documents the gnarly URL_REGEX class constant. Ruby's regex literal free-spacing mode (using the x modifier) is used to accomplish this along with single-line comments.

The one and only change to the regular expression itself is in the final part, where a double quote (") character is added to the character class that causes the regular expression to stop matching. This is done because I have observed that many Google Calendar descriptions contain HTML descriptions.

When an HTML link is defined in an iCalendar description property (as is common for Google Calendar event authors, given GCal's rich text editing mode), the description will contain a URL such as:

<a href="https://example.com/">Something.</a>

When a description like the above exists, AND no RFC 5545 URL property exists in the input feed, then the description_urls method of this file will incorrectly determine that the first URL is:

https://example.com/">Something.</a>

If this variable is then used in a Liquid template as, for example:

<a href="{{ event.url }}">{{ event.summary }}</a>

the resulting HTML will actually be the unexpected value:

<a href="https://example.com/">Something.</a>"&gt;Example Event Summary&lt;/a&gt;

A better solution to what is presented here would be more explicit URI parsing, but given that the most popular iCalendar generators always use a double quote character for enclosing HTML attributes, this should be safe and is simple enough for now.