ampproject / amp-wp

Enable AMP on your WordPress site, the WordPress way.
https://wordpress.org/plugins/amp/
GNU General Public License v2.0
1.79k stars 383 forks source link

Incorporate AMP validator into post authorship flow, to alert if content within the post editor is not compatible with AMP #843

Closed MackenzieHartung closed 6 years ago

MackenzieHartung commented 6 years ago

Acceptance Criteria

AC1: On saving a post on /wp-admin/post.php, if there’s post content that’s not compatible with AMP, display an error at the top of the page. AC2: The error will not display specifically what is incompatible. AC3: The error will display the following text: “Notice: your post fails AMP validation”.

kienstra commented 6 years ago

Request To Work On This

Hi @ThierryA and @westonruter, Could I please work on this? Though this is in Sprint 3, It looks like we're doing well on Sprint 2. But maybe you have other priorities in mind.

If so, I'll write a simple solution design here so we're on the same page.

Update: I'm now working on a sub-issue of #839.

Thanks!

westonruter commented 6 years ago

@kienstra Please do. I see this highly related to #842, but probably a different mechanism. For content, I suppose it would involve hooking into the whitelist sanitizer to find elements that are dropped due to being illegal? If this could identify the specific paragraph in the content that has invalid content in it, that would be ideal. Naturally this would straightforward to accomplish with Gutenberg, and perhaps Gutenberg should be the focus for this here then.

For #842 my thought was do do something at a higher level, when hooks are run, to run the validator on the entire page output rather than on just the content.

I don't see these two as being mutually exclusive. One is focused on the content authorship—telling users what they can't author—and the other is focused on the site administration—telling users what plugins are doing illegally.

kienstra commented 6 years ago

In Development

Hi @westonruter, Thanks for your details here. If it's alright, I'll post an update soon.

westonruter commented 6 years ago

Let's chat about the approach to take here.

kienstra commented 6 years ago

Possible Approach

Hi @westonruter, Thanks for waiting here. One approach that I've prototyped is front-end validation in Gutenberg (screencast).

Unfortunately, this only applies on editing a block. Not on saving it, as you mentioned.

Here's a very rough prototype that you could paste in the console on a Gutenberg post.php page:

var isValidAMP,
    previousWp = wp,
    el = wp.element.createElement,
    Notice = wp.components.Notice,
    blockType = 'core/html',
    htmlBlock = wp.blocks.unregisterBlockType( blockType ),
    OriginalBlockEdit = htmlBlock.edit;

/**
 * Whether markup is valid AMP.
 *
 * Use the AMP validator from the AMP CDN.
 * And place the passed markup inside the <body> tag of a basic valid AMP page.
 *
 * @param string markup
 * @returns {boolean} $valid Whether the passed markup is valid AMP.
 */
isValidAMP = function( markup ) {
    var ampDocument = `<!doctype html><html ⚡><head><meta charset="utf-8"><link rel="canonical" href="./regular-html-version.html"><meta name="viewport" content="width=device-width,minimum-scale=1"><style amp-boilerplate>body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animation:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,end) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}</style><noscript><style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style></noscript><script async src="https://cdn.ampproject.org/v0.js"></script></head><body>${markup}</body></html>`,
        validated = amp.validator.validateString( ampDocument );
    return ( validated.hasOwnProperty( 'status' ) && 'PASS' === validated.status );
};

jQuery.getScript( 'https://cdn.ampproject.org/v0/validator.js', function() {
    /*
     * Workaround, not recommended in actual plugin.
     * Replaces the wp value that the script above overwrote.
     */
    wp = previousWp;
} );

/**
 * Overwrites the 'Custom HTML' block's edit() function.
 *
 * Outputs the original component, in OriginalBlockEdit.
 * If it's not valid AMP, it also outputs a Notice.
 */
htmlBlock.edit = function( props ) {
    var content = props.attributes.hasOwnProperty( 'content' ) ? props.attributes.content : '';
            result = [ el( OriginalBlockEdit, Object.assign( props, { key: 'original' } ) ) ];
    if ( ! isValidAMP( content ) ) {
        result.unshift( el(
            Notice,
            {
                key: 'notice',
                title: 'Notice',
                status: 'warning',
                content: 'This is not valid AMP',
            }
        ) );
    }
    return result;
};
wp.blocks.registerBlockType( blockType, htmlBlock );

Issues:

This is based on the 'Adding a caption' section in this post.

westonruter commented 6 years ago

@kienstra that looks amazing! Having that client-side JS-based validation is perfect as it is realtime and it is block-based. You should absolutely keep going with this. I don't see this in any way conflicting with the validation of the frontend as a whole. When saving, we can send out a request to get the frontend HTML and then run it through the validator as you're doing at the block level. This will need a bit more

The AMP CDN validator.js script overwrites the wp object with a function.

Why would it do this? That seems like a big flaw in the validator script. Apparently it's not encapsulating the minified function names which is causing wp to be defined be accident. There is also a wo and many other variables that are polluting the global namespace. This should get fixed in the AMP project as a prereq.

amedina commented 6 years ago

@pbakaus @Gregable take a look at this issue with the validator...

Gregable commented 6 years ago

Would you mind filing an issue on the AMP github tracker: https://github.com/ampproject/amphtml/issues/new

kienstra commented 6 years ago

AMP Validation In 'Classic' Editor

Here's a screencast of validating a post in the 'classic' editor. On saving the post, the JavaScript file makes an AJAX request to the permalink. Then, it validates that entire HTML document for AMP compatibility, using amp.validator.validateString().

But the back-end sanitizers work really well, and it's hard to enter content that won't be AMP-compatible by the time the page is rendered.

class-editor-validation

Next steps include applying similar front-page validation to Gutenberg.

kienstra commented 6 years ago

Moving Into QA

Hi @westonruter, If it's alright, I'm moving this into QA.

Steps To Test

  1. Create a new post on the test site
  2. Paste the following content in the editor:
    
    <button onclick="alert('this')">This</button>
    <button onclick="alert('this')">This</button>
3. Click "Save Draft"
4. Expected: a warning with
* A message that the post isn't valid
* The invalid elements
* The invalid attributes

<img width="1062" alt="validation-failure-message" src="https://user-images.githubusercontent.com/4063887/36436698-a32aa08a-162a-11e8-8ca2-a77462cd4f44.png">

5. Remove the content, and paste:

https://www.youtube.com/watch?v=GGS-tKTXw4Y Example content


6. Click "Save Draft"
7. Expected: there's no warning 

**Known Issue**
The sanitizer seems to also report removal of `<p>`, when also removing another element. 
westonruter commented 6 years ago

The sanitizer seems to also report removal of <p>, when also removing another element.

This is due to the whitelist stanitizer removing empty parent elements when an invalid element is removed. I believe it is this logic here:

https://github.com/Automattic/amp-wp/blob/388f79f369997938224aa489fa86cf003a3eb9cd/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1479-L1485

At the very least this can be changed from remove_invalid_child to just removeChild since it is not invalid. But I'm not 100% sure this routine is needed.

ThierryA commented 6 years ago

@westonruter @kienstra what is the reason for the preprocessor remove the parent if the parent is valid, is the intention not leave empty HTML Tag?

kienstra commented 6 years ago

Removed Parent Element

Hi @ThierryA, It looks like the intent is for the preprocessor to remove the parent if its only child was the removed element.

In that case, there's probably no reason to report its removal, as we don't know if it's invalid. @westonruter's suggestion works locally:

$parent = $parent->parentNode;
if ( $parent ) {
-         $this->remove_invalid_child( $node );
+         $parent->removeChild( $node );
}

There's also a slightly-relate edge case where this doesn't report the removal of a parent if the child is removed. For example:

<nonexistent><notatag></notatag></nonexistent>

This only reports:

Invalid elements: notatag

Making this change to replace_node_with_children() seems to address this:

        // Replace node with fragment.
        $node->parentNode->replaceChild( $fragment, $node );
+       if ( isset( $this->args['remove_invalid_callback'] ) ) {
+           call_user_func( $this->args['remove_invalid_callback'], $node );
+       }
csossi commented 6 years ago

@kienstra , not seeing any warning here image

kienstra commented 6 years ago

Dev Site

Hi @csossi, Thanks for testing this. Could you please use the dev site? It has more up-to-date code. It looks to display the errors, as expected.

csossi commented 6 years ago

verified in QA

kienstra commented 6 years ago

Request For Regression Testing

Hi @csossi, Thanks for testing this. Could you please retest, to ensure there wasn't a regression?

Please create a post using the classic editor. Here are the test steps.