closeio / quotequail

a library that identifies quoted text in email messages
MIT License
58 stars 23 forks source link

Add img to replaced tags which get preserved in HTML from slicing. #26

Open andreip opened 6 years ago

andreip commented 6 years ago

fixes #22

Couldn't think of a different approach, since an img isn't really a block, so it'll never have a text within it, so no point in generating a different html in get_line_info functions. Instead, what was missing was it being treated as a special case: don't want to slice a line from the HTML by just looking at the plain text lines, since that could slice an img, need to also look at the start/end refs for replaced tags.

See more about a replaced element (https://developer.mozilla.org/en-US/docs/Web/CSS/Replaced_element). I think it might be worth adding a few more things to the list? e.g. video, embed etc. ; not sure about iframe and how that'd be treated in lxml parsing though, but I suppose you could have an iframe with just an image in it, in which case you'd still want to keep it?

Full list would be a total of 9 replaced elements (or 10 if we also count input; although I'm not sure of all examples where that'd generate sth even if it apparently has no text in it).

afzalIbnSH commented 3 years ago

Can this please be merged?

afzalIbnSH commented 3 years ago

I tested this PR in my project and it works nicely. Would be great if this can be merged. @wojcikstefan