cgiffard / Downsize

Tag safe text truncation for HTML and XML!
BSD 3-Clause "New" or "Revised" License
41 stars 13 forks source link

Add support for rounding to end of sentence #22

Closed remy closed 10 years ago

remy commented 10 years ago

Use:

downsize("<p>My super long sentence that I don't want chomped.</p><p>But I would like around 75 characters on the page, but I don't want to cut halfway through a sentence.</p>", {
  characters: 75, 
  round: true
}); // result: original two <p> tags

Also includes initial tests (all passing) (could do with more?)

If options.round is used, then as the text is parsed, if it hits the character limit, it also checks to see if it's at the edge of a closing sentence tag . If not, it'll continue to scoop up the text.

If the parser reaches a PARSER_TAG_COMMENCED character (<), then it peeks at the upcoming HTML, and checks if that tag would suggest the end of a sentence, i.e. a </p>, </div>, </li>, etc.

Note: I also tested that it works correctly with unclosed <p> tags, which it does, but the current version of Downsize tries to close all tracked open tags (so if there's more than one automatic closing <p> tag), they're all slapped on the end of the string, resulting in: <p>foo<p>bar</p></p> - this is a completely separate bug which may or may not be worth dealing with.

cgiffard commented 10 years ago

Aaah, this is super helpful, and greatly appreciated! Thanks for submitting a PR.

I agree that the tag closing behaviour needs to be fixed, but the question that stands out in my mind is whether a simple whitelist of tags that don't need to be closed would suffice — I think it won't, and there'll need to be some real thought as to how best to balance the document. But I'll get to that in another issue. :)

adam-zethraeus commented 10 years ago

@remy @cgiffard

Hey guys!

From a cursory look, this seems to dupe the functionality of 'contextualTags'. (i.e. when i wrote contextualTags, it was for the purpose of getting this functionality into ghost.)

Could you take a look at it and confirm/deny this?

i.e. this same effect is available by passing in all of your sentenceTerminatorElements as contextualTags.

it("should await the end of the containing paragraph", function () {
         downsize("<p>there are more than seven words in this paragraph</p><p>this is unrelated</p>", {words: 7, contextualTags: ["p", "ul", "ol", "pre", "blockquote"]})
             .should.equal("<p>there are more than seven words in this paragraph</p>");
     });

if so:

if i've misunderstood, please let me know!