Lullabot / amp-library

Convert HTML to AMP HTML and report HTML compliance with the AMP HTML specification
Other
382 stars 181 forks source link

Validator Generated (validator-generated.php file) needs update #231

Open sarath49 opened 6 years ago

sarath49 commented 6 years ago

validator-generated.php file has not been updated for 2 years and there are several implementation like the issue

https://github.com/Lullabot/amp-library/issues/222

which removed second or more implementation of side bar.

mxhCodes commented 6 years ago

Another thing which does not work because the validator is old like my gandpa: amp-list when using further attributes "items" and "single-item".

Please update the validator.

mxhCodes commented 6 years ago

Seems that the fork needs a synchronization with the original AMPHTML repo first to get a new generated validation file: "This branch is 12 commits ahead, 5647 commits behind ampproject:master."

timonweb commented 6 years ago

Any plans on updating validator-generated.php? AMP has got many new tags which isn't possible to use due to outdated validator-generated.php.

mxhCodes commented 6 years ago

Currently seems to be not easy, see also https://www.drupal.org/project/amp/issues/2962305

marcelovani commented 6 years ago

See another discussion here #181

westonruter commented 6 years ago

I suggest using the generated PHP file in the AMP WordPress plugin: https://github.com/Automattic/amp-wp/blob/develop/includes/sanitizers/class-amp-allowed-tags-generated.php

We'd like to work on extracting this into a Composer package as well to make it easier to incorporate.

Here's the Python script that generates the file from the latest amphtml release: https://github.com/Automattic/amp-wp/blob/develop/bin/amphtml-update.sh

marcelovani commented 5 years ago

Hi @westonruter

I was very curious to see this working, and I did some experiments as a proof of concept. I actually did the opposite first, I moved some code from AMP WP into the AMP Library to prove that WP can work with it. The next step would be to start porting the code from WP Plugin into the library or perhaps updating the library, whatever makes more sense. Please have a look at this and comment below.

This experiment is an attempt to merge the functionalites of Wordpress AMP plugin and Lullabot AMP Library by making an assessment of what could be abstracted and put into a PHP package that could be used by different platforms.

I work with AMP for Drupal and I am one of the contributors to the Drupal AMP project. The Drupal module uses the AMP Library to validate and convert the markup. The Library comes with many interesting functionalities including:

The AMP Library was developed in 2016 by Lullabot in partnership with Google. Some of the key people involved:

Here is the Full list of contributors

Maintainers of the Drupal AMP Module:

Here is the Full list of contributors

The problem

As Google AMP team keep pushing new functionalities into the AMP Html Project on a daily basis, we have a situation where the AMP Library no longer works with the latest AMP Tags and attributes, which are stripped out by the AMP Library. Here are some of the issues for AMP Library and Drupal AMP module:

AMP Library uses an auto-generated class to validate and strip invalid AMP tags, see more details here This file is generated by a the Lullabot fork of the ampproject/amphtml.

The problem is that the generated file no longer works with the latest AMP HTML code. There is another problem, the Validators on this folder are basically a port from validate.js to PHP classes. The original validate.js has changed a lot and has been moved into a folder called engine.

So, the fix requires a total re-write of the PHP classes and the Python script that creates the generated validator class.

The experiment

1. WordPress

I was curious about the Wordpress plugin

The Wordpress plugin is mainly maintained by:

Here is the Full list of contributors

I decided to have a look into the code to see how it works and what bits of functionalities are common with AMP Library. I can see that the module has some Sanitizers which are called Passes in the AMP library. This is a good sign that we can probably join some of the code.

Since the AMP Library is already a standalone package, I started by moving some of the code from the Wordpress plugin into the AMP Library. I also removed the git submodule. See all comments on this Pull request Abstracting AMP validators and sanitizers into AMP Library .

So, now we have WordPress plugin using AMP Library!

It does the job and also prints some debugging information, see screenshots: Screenshots On the left, you see the Desktop version on a WordPress site. In the middle, you see the AMP version of the article using the original AMP WP plugin. On the right, it's the Forked WP Plugin using the Lullabot Library with some extra debugging information (optional).

This is the HTML markup of the article

AMP Converter Test
<p style="text-align: center;">Image
  <img class="wp-image-13 size-medium aligncenter" src="https://cdn.shopify.com/s/files/1/2222/0805/products/Stereo-845-PP-e1441807241782_480x480.jpg" alt="" width="300" height="198" />
</p>
<p style="text-align: center;">YouTube Video
  <iframe src="https://www.youtube.com/embed/cKgIsOA-G-Q" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe>
</p>

This is the code that replaces the class: see https://github.com/marcelovani/amp-wp/pull/4/files#diff-35292ad861a5ed6690412c3ccfe5b0adR318

private function build_post_content() {
    $amp = new \Lullabot\AMP\AMP();
     $content = apply_filters('the_content', $this->post->post_content);
     $options = array(
      'scope' => 'body',
      'add_stats_html_comment' => false
    );
    $amp->loadHtml($content, $options);
    $this->add_data_by_key('post_amp_content', $amp->convertToAmpHtml());
    $this->merge_data_for_key('amp_component_scripts', $amp->getComponentJs());
    $this->add_data_by_key('post_amp_debug', $amp->warningsHumanHtml());
    return;

I wanted to change the scope to apply the AMP markup on the full HTML that is rendered by WordPress, but I don't know how to do that. But for the proof of concept, this should be a good start.

How to test this

2. Lullabot AMP Library

I moved some code from the WP plugin and from the Lullabot fork of AMPHTML into This fork of AMP Library See this pull request Amphtml generator update

How to regenerate the validator scripts

The AMP Library currently has both validators (class-amp-allowed-tags-generated.php and validator-generated.php) side by side, until we do the actual work to update the classes and then get rid of one of them.

The build script will check out the latest tag of AMPHtml, copy the Python scripts inside vendor/amphtml/validator folder, generate the PHP files and copy them into src/Spec folder. These files can be committed and the AMP Library can be tagged and released.

The command line tool

Pull requests:

Challenges and ideas

We can either discuss this here, or use

westonruter commented 5 years ago

I've filed https://github.com/ampproject/amp-wp/issues/2315 to begin an exploration into extracting the logic from the AMP WordPress plugin into something which can be re-used by PHP-based CMSes.