DoSomething / bertly

🔗 A serverless link shortener.
https://dosome.click/wq544
MIT License
2 stars 1 forks source link

Incorrect decoding of `@` in URLs #24

Closed mshmsh5000 closed 6 years ago

mshmsh5000 commented 6 years ago

I haven't figured out whether there are other characters like this. Take this target URL:

https://medium.com/@susanbordo

Fully encoded:

https%3A%2F%2Fmedium.com%2F%40susanbordo

We can pass either of these to Bertly:

curl -X POST -d "url=https%3A%2F%2Fmedium.com%2F%40susanbordo" --header "X-BERTLY-API-KEY: asd123" https://bert.ly/

curl -X POST -d "url=https://medium.com/@susanbordo" --header "X-BERTLY-API-KEY: asd123" https://bert.ly/

In either case, the unshortened URL appears thus:

$ curl -I https://bert.ly/3qv
HTTP/1.1 302 Found
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Date: Thu, 31 May 2018 18:46:01 GMT
Location: https://medium.com/%40susanbordo

Following this leads to a 404, on the command line as well as in multiple browsers:

$ curl -I "https://medium.com/%40susanbordo"
HTTP/1.1 404 Not Found
DFurnes commented 6 years ago

Taking a little look at this - it seems to be caused by the iri_to_uri filter we run before redirecting. Do you think this would be safe to remove now that we normalize URLs before storing them? Or should we be running uri_to_iri before storing in Redis, and keep iri_to_uri on the way out?