ampproject / amphtml

The AMP web component framework.
https://amp.dev
Apache License 2.0
14.89k stars 3.89k forks source link

Floating point rounding errors on cached version of page from HTML whitespace sanitization #13806

Closed jasonsturges closed 6 years ago

jasonsturges commented 6 years ago

What's the issue?

AMP cache version of our page suffers from math floating point rounding errors.

HTML rendered from our AMP page lists product prices as: $29.99 and $20.00

However, Google's cached version of our AMP page seems to suffer from floating point rounding errors, changing prices to: $27.990000000000002, and $20.0

Our AMP Page Google's AMP Page Cache
www magazine store_the-magnolia-journal_amp_ iphone 7 www google com_amp_s_www magazine store_the-magnolia-journal_amp_ iphone 7

The only thing I can think of is AMP's HTML sanitization process, where perhaps a parser is interpreting these values as floating point numbers in an attempt to remove whitespace?

From Google AMP Cache documentation on HTML sanitization:

Whitespace inside tags is stripped.

If you look at our AMP markup, there's whitespace in the element:

<span itemprop="price" class="selectionPrice">
                                            27.99</span>

What's also interesting is that Google's AMP page cache still has the whitespace:

<span class="selectionPrice" itemprop="price">
                                            27.990000000000002</span>

How do we reproduce the issue?

Our Non-AMP page, with amphtml meta reference: https://www.magazine.store/the-magnolia-journal/

Our AMP page version: https://www.magazine.store/the-magnolia-journal/amp/

Compare that to Google's cached version of the page: https://www.google.com/amp/s/www.magazine.store/the-magnolia-journal/amp/

Otherewise, the only steps I can think of to reproduce this is to place a numerical value inside an element with whitespace, and view the rendered result after Google's HTML sanitization process:

    <span>
        1.99</span>

What browsers are affected?

All browsers

Which AMP version is affected?

Pages affected have just been released to AMP

aghassemi commented 6 years ago

@honeybadgerdontcare, we don't sanitize textNodes, specially parse numbers, do we?

This may be an issue with origin servers responding differently to cache UA than user or maybe at the time of cache the value of wrong.

jasonsturges commented 6 years ago

@aghassemi This site doesn't have any special logic per user agent, and the front-end code explicitly formats for prices:

JSPF fragment:

<span itemprop="price" class="selectionPrice">
    <fmt:formatNumber type="currency"
                      pattern="##0.00;"
                      value="${selectionItem.listPriceAmount}"/>
</span>

I realize it's a stretch, but I do not believe there is any possible way for the page to render this value unless JSTL format number is in error.

The only thing I could think of is the document object parser you're using to rewrite HTML might be interpreting a value in the DOM as numeric.

honeybadgerdontcare commented 6 years ago

I am unable to reproduce this issue. We don't do anything special with whitespace within text nodes of the body and we don't do anything with regard to numerics. If an example could be made the consistently reproduced this, then please reopen.

My example show 27.99 appearing just fine on both origin and cache: http://astute-city-159523.appspot.com/files/floating.html https://astute--city--159523-appspot-com.cdn.ampproject.org/c/astute-city-159523.appspot.com/files/floating.html

jasonsturges commented 6 years ago

@honeybadgerdontcare, @jridgewell - for completeness, it turns out that JSTL's fmt:formatNumber bypasses type and pattern attributes when it cannot determine a locale.

If it cannot determine a formatting locale, it uses Number.toString() as the output format.

AMP (and GoogleBot) crawlers presumably do not specify an accept-language, which makes sense.

Sorry to burden you with this request, and thanks for thoroughly attempting to replicate the issue using an isolated example.