D61-IA / stellar-gnosis

Gnosis paper management and collaboration tool
Apache License 2.0
0 stars 1 forks source link

&'s in download URLs are escaped to & #18

Open huonw opened 5 years ago

huonw commented 5 years ago

When adding a paper from dl.acm.org, such as https://dl.acm.org/citation.cfm?id=1454225

image

the download URL is automatically filled out to point to https://dl.acm.org/ft_gateway.cfm?id=1454225&type=pdf . However, the URL undergoes HTML escaping (that is, escaping for inclusion in a HTML document, not even escaping for inclusion in a URL), so the & gets converted to the HTML entity representing the ampersand: &: http://dl.acm.org/ft_gateway.cfm?id=1454225&type=pdf

image

By a lucky coincidence, these URLs still work: the & at the start of the entity still separates the ID from the rest of the URL parameters, and the dl.acm.org endpoint must ignore invalid parameters (in this case, the amp;type=pdf), e.g. even http://dl.acm.org/ft_gateway.cfm?id=1454225 works fine to download the paper.

The database likely has many of these corrupt download links, e.g. http://gnosis.stellargraph.xyz/catalog/paper/1241/ is the stored copy of the above paper, and the download button is:

<a href="http://dl.acm.org/ft_gateway.cfm?id=1454225&amp;amp;type=pdf" class="btn btn-primary" role="button" target="_blank">Download</a>

(Note the second &amp; HTML escape, too.)