borodean / postcss-assets

An asset manager for PostCSS
MIT License
537 stars 32 forks source link

Improve SVG encoding in data URIs #46

Open steffenweber opened 8 years ago

steffenweber commented 8 years ago

First of all, thank you for PostCSS Assets!

I think the encoding of SVGs in data URIs can be improved as described in the blog post Optimizing SVGs in data URIs. The method "Optimized URL-encoded" yields a smaller result than "Fully URL-encoded".

postcss-svgo uses the optimized URL-encoding. (But in my specific use case the SVG files have already been processed by SVGO when I pass them to PostCSS Assets. So ideally I do not want to run them through SVGO a second time.)

borodean commented 8 years ago

@steffenweber thanks, this is a really nice one. I've started to implement the advanced optimization based on that article and the postcss-svgo plugin code: https://github.com/assetsjs/assets/tree/feature/svg-data-optimization

However, for now I would like to omit the quotes optimization, because they require some tricky logic: the quotes in attributes should be forced to be single, while the rest should be kept as they were (so we don’t break possible textual content inside SVG). postcss-svgo acts pretty naive in this case — it just converts every quote, so those textual contents could be spoiled.

I think I'd release a new version without quote optimization and do it later. I would also appreciate any help in implementing it.

steffenweber commented 8 years ago

Thank you for starting the implementation! There is a small change required to make it work because currently the generated data URIs have a syntax error (Chrome reports net::ERR_INVALID_URL in the Developer Tools console).

Bad: data:image/svg+xml;… Good: data:image/svg+xml,… Alternative: data:image/svg+xml;charset=utf-8,…

The alternative is easier to implement: just prepend 'charset=utf-8,' to the result of optimizedEncodeUri in encodeBuffer.js. The more complete fix would be to change this module's code such that not all data URIs require a charset/encoding (I have not tried to implement that).

Quote optimization: I thought that quotes appearing unencoded as " instead of as " in textual contents were a syntax error in SVG. But the W3 validator has no problem with them. :confused:

General approach: For safety reasons, I think it would be better to apply encodeURIComponent and then undo those substitutions that are known to be safe (whitelist instead of blacklist approach). Like this:

module.exports = function (string) {
  return encodeURIComponent(string.trim())
    .replace(/%20/g, ' ')
    .replace(/%2F/g, '/')
    .replace(/%3A/g, ':')
    .replace(/%3D/g, '=');
};
steffenweber commented 8 years ago

Hmm, maybe the proposed optimization is not such a good idea after all.

Permitted characters within a data URI are the ASCII characters for the lowercase and uppercase letters of the modern English alphabet, and the Arabic numerals. Octets represented by any other character must be percent-encoded, as in %26 for an ampersand (&). https://en.wikipedia.org/wiki/Data_URI_scheme

See also: https://perishablepress.com/stop-using-unsafe-characters-in-urls/

efender commented 7 years ago

What about an option for custom function to encode svg like here? For example: 1480374592263