cgiffard / Downsize

Tag safe text truncation for HTML and XML!
BSD 3-Clause "New" or "Revised" License
41 stars 13 forks source link

Option to remove scripts and embeds #21

Open ErisDS opened 10 years ago

ErisDS commented 10 years ago

Downsize is all about text truncation, we want to take some HTML or XML, and get a truncated version.

An issue was recently raised in Ghost suggesting that it doesn't make sense to include the content of <script> tags in our {{excerpt}}. The content which appears between <script> tags isn't really text, so doesn't make much sense for the text-only version.

There are a set of tags for which this is true, going through https://developer.mozilla.org/en/docs/Web/Guide/HTML/HTML5/HTML5_element_list I'd suggest:

<script>, <style>, <template> and all the embedded content tags: <iframe>, <embed>, <object>, <param>, <video>, <audio>, <source>, <track>, <canvas>, <map>, <area>, <svg> and <math>.

However, I'm not sure what other use-cases there are for downsize beyond Ghost so I definitely suggest that this should be optional, but I think it makes sense to add the ability to remove these tags entirely from the truncated version?

cgiffard commented 10 years ago

Thanks so much for curating this list. I'll look into building something in.

koriroys commented 10 years ago

Should the <code> tag be part of this list?

cgiffard commented 10 years ago

No — contrary to its name, <code> is definitely text.