Slugs with uppercase letters throw off URL-cleaner

The pattern to recognize a URL as a DocumentCloud oEmbedable URL is very permissive.

Since many of our resources (pages, notes) have multiple URL patterns, including with page anchors, we have a clean_dc_url() function that recomposes them into the single canonical (and oEmbed-safe) versions. E.g., https://www.documentcloud.org/documents/282753-lefler-thesis.html#document/p57/a42282 is recomposed to https://www.documentcloud.org/documents/282753-lefler-thesis/annotations/42282.html.

Our base document slug pattern, however, has a bug. It only recognizes lowercase alphanumeric slugs, not uppercase. Because of the permissive pattern pointed to above, those URLs still get passed to the oEmbed endpoint, but they don't get cleaned and recomposed, so anchored-variant pages/notes get the document viewer returned instead.

documentcloud / wordpress-documentcloud

Slugs with uppercase letters throw off URL-cleaner #36