documentcloud / wordpress-documentcloud

Embed DocumentCloud documents that won't be eaten by the visual editor
https://wordpress.org/plugins/documentcloud/
GNU General Public License v2.0
9 stars 14 forks source link

Slugs with uppercase letters throw off URL-cleaner #36

Closed reefdog closed 8 years ago

reefdog commented 8 years ago

The pattern to recognize a URL as a DocumentCloud oEmbedable URL is very permissive.

Since many of our resources (pages, notes) have multiple URL patterns, including with page anchors, we have a clean_dc_url() function that recomposes them into the single canonical (and oEmbed-safe) versions. E.g., https://www.documentcloud.org/documents/282753-lefler-thesis.html#document/p57/a42282 is recomposed to https://www.documentcloud.org/documents/282753-lefler-thesis/annotations/42282.html.

Our base document slug pattern, however, has a bug. It only recognizes lowercase alphanumeric slugs, not uppercase. Because of the permissive pattern pointed to above, those URLs still get passed to the oEmbed endpoint, but they don't get cleaned and recomposed, so anchored-variant pages/notes get the document viewer returned instead.

reefdog commented 8 years ago

The impact of this was that pages and notes from documents with an uppercase letter in the slug weren't embeddable via the plugin; you'd always get the full document embed instead.