cgiffard / Downsize

Tag safe text truncation for HTML and XML!
BSD 3-Clause "New" or "Revised" License
41 stars 13 forks source link

downsize doesn't handle invalid html #20

Closed discordianfish closed 10 years ago

discordianfish commented 10 years ago

Hi,

it's probably up for discussion if it should, but I ran into that issue in my ghost blog:

downsize returns more than the requests words if there is a img tag with a src attribute ending with "":

downsize('<p>some <img src="foo.jpg""> here</p><p>and some more here</p>', {"words": 2})
adam-zethraeus commented 10 years ago

You were writing html directly into the post? That is a valid use case for Downsize in ghost. mm

discordianfish commented 10 years ago

@zethraeus right, you can use raw html in ghost/markdown.

cgiffard commented 10 years ago

I guess the question is — how much engineering effort do we put into making downsize handle invalid markup? :-/

We'd need a parser with lookahead to know the string wasn't terminated, and that'd prevent practical streaming. @zethraeus thoughts?

adam-zethraeus commented 10 years ago

I don't really understand how streaming fits into downsize's mission. Could you elaborate?

(In the context of Ghost, I can't really see how streaming is important. If it's a priority for downsize, and if i'm not misunderstanding, maybe it warrants a fork?)

cgiffard commented 10 years ago

I don't think it warrants a fork. Streaming fits into the node context — I want a small library that can be used on static strings, or streams of data. But I think I threw in a bit of a distraction there, sorry — let's forget about streaming for the moment. :)

The real question is — how much effort do we want to put into making downsize robust against invalid markup? Maybe if we hit something unexpected, we drop out of tag-safe mode, and into hard-word-break mode?

cgiffard commented 10 years ago

I'm closing this for now — but should we decide we need to focus on handling invalid markup, we can reopen the issue. :)