arendjr / text-clipper

Fast and correct clip functions for HTML and plain text
MIT License
133 stars 13 forks source link

Clipping HTML table has unexpected results #12

Closed CExJTian closed 4 years ago

CExJTian commented 4 years ago

Hi Arend,

We have table

image

<table border="1" cellpadding="1" cellspacing="1" style="width: 500px">
    <tbody>
        <tr>
            <td>
            <ul>
                <li>fb</li>
            </ul>
            </td>
            <td>
            <ul>
                <li>fbfbfb</li>
            </ul>
            </td>
        </tr>
        <tr>
            <td>
            <ul>
                <li>google</li>
            </ul>
            </td>
            <td>
            <ul>
                <li>twitter</li>
            </ul>
            </td>
        </tr>
        <tr>
            <td>
            <ul>
                <li>intel</li>
            </ul>
            </td>
            <td>
            <ul>
                <li>amazon</li>
            </ul>
            </td>
        </tr>
    </tbody>
</table>

If we do

clip(theTableContent, 150, {
    html: true,
    breakWords: true,
});

we get the truncated table

<table border="1" cellpadding="1" cellspacing="1" style="width: 500px">
    <tbody>
        <tr>
            <td>
            <ul>
                <li>fb</li>
            </ul>
            </td>
            <td>
            <ul>
                <li>fbfbfb</li>
            </ul>
            </td>
        </tr>
        <tr>
            <td>
            <ul>
                <li>google</li>
            </ul>
            </td>
            <td>
            <ul>
                <li>twitter</li>
            </ul>
            </td>
        </tr>
        <tr>
            <td>
            <ul>
                <li>intel</li>
            </ul>
            </td>
…</tr></tbody></table>

which displays as image

Notice the indicator ... is in incorrect position.

Could you please let me know if this is a bug?

Thanks!

arendjr commented 4 years ago

Nice catch!

Of course, I could say text-clipper is doing its best because it properly closes the HTML syntax... But from a user's perspective, I fully understand you had hoped for more intelligent behavior with tables :)

It does beg the question, what do you think would have been the desired behavior here? I think I'm leaning towards either displaying the table in its entirety or hiding the table in its entirety, which is what we also do for SVG and MathML content. Would that help?

Thanks!

CExJTian commented 4 years ago

Hi Arend,

Thanks for the quick response!

I was wondering if we can at least moving the ellipses after the table. That would help a lot.

Thanks

arendjr commented 4 years ago

I see... I'll have a look!

arendjr commented 4 years ago

Alright, so I've improved the table handling, but it wasn't exactly trivial, so I'll motivate my changes...

When I looked into the example you posted and tried to make a test case out of it, something struck me as odd: Why is the table being clipped at all, when you specified 150 characters as your maximum length? The text in the table certainly doesn't reach those limits...

The answer is in the whitespace. The whitespace between the HTML tags is being counted towards your character limit. This is partly by design, because whitespace is relevant if <pre> formatting is used, but here it is having unintended side-effects. Whitespace inside a table (but not inside a <td> element) is never rendered, so we shouldn't count it towards our character limit.

So I made the above improvement and made sure whitespace in such places is no longer counted. This automatically results in the clipping no longer happening at the whitespace itself, thus the clipping indicator will also no longer appear in such odd positions (unless your input snippet actually contains non-whitespace characters outside of the table cells).

Note: A similar improvement has been made for whitespace inside <ol>/<ul> tags (but not inside <li> tags), but the exact combination in your example above, of having an <ul> inside a <td> still causes some of the unrendered whitespace to be counted towards the character limit, but I have decided to ignore this for now as it wouldn't result in incorrect formatting.

For the full changes, please see: https://github.com/arendjr/text-clipper/commit/84847f9d98621f5753abc3819ac1164e24195446

Let me know if this solution satisfies you and feel free to reopen the issue if it doesn't. Also, if you would like to have release with this fix, just let me know.

CExJTian commented 4 years ago

Thank you Arend! The fix works for us. 👍

It would be great to have release with the fix.

arendjr commented 4 years ago

Version 2.1.0 has been published!

CExJTian commented 4 years ago

thanks Arend!