Automattic / vip-go-wpcom-compat

VIP Go / WordPress.com Compatibility
https://wpvip.com
2 stars 6 forks source link

Protected Embeds: Add CLI command to render all `protected-iframe` shortcodes #3

Closed vaurdan closed 5 years ago

vaurdan commented 6 years ago

This command will parse and render every single protected-iframe shortcode present on the site.

It requires a CSV file, that can be generated on the original site, with all the Protected Embeds.

vaurdan commented 6 years ago

Thanks for the review! I'm going to start addressing these issues.

vaurdan commented 6 years ago

@mjangda I have addressed most of the issues on your feedback.

Regarding the SQL pagination, after some testing, I noticed that it is slower and more unreliable than fetching all the posts at once. Sometimes queries would fail silently, fooling the script to think that the replacement is done.

Also, given that I need to get the total number of posts (for progress report), instead of using SQL_CALC_FOUND_ROWS and multiple queries for every 100 posts, I believe that one single query returning all the posts will be better. So far, on every occasion this script was used, it was fast and flawless, despite the limitless query. If we start seeing issues with this query, we can obviously iterate and add some pagination. What are your thoughts on that?

emrikol commented 6 years ago

@vaurdan I think this makes sense. It's possible that we could run out of memory or hit a killed query, but at least this command should be able to restart where it left off when restarted.

Should we add a dry run, so that we can see if the primary SQL query will die? Maybe we can test a dry run on a giant db?

joshbetz commented 6 years ago

In order to avoid running duplicative migration steps many times (every time we refresh content), I'm thinking we might be able to use the shortcode when we're pre-launch and run this as cleanup either during the launch or even post-launch.

In that case, we'd have the protected embeds table local to the site. So we could pull embeds directly from that table instead of generating a CSV.

We would probably want an additional script to remove the protected embeds table after we've verified that the script finished successfully.

Thoughts?

Also: What happens if someone edits a post with an iframe/script/etc who doesn't have proper permissions? I would have thought any "unsafe" embeds would get removed.

emrikol commented 6 years ago

We would probably want an additional script to remove the protected embeds table after we've verified that the script finished successfully.

Maybe we could add this as a flag to this CLI? --remove-table-if-empty?

What happens if someone edits a post with an iframe/script/etc who doesn't have proper permissions? I would have thought any "unsafe" embeds would get removed.

I think that's a valid concern. One idea I've had splashing around in my head was to convert these to a Gutenberg embed block, and offer proper editing that way? I'm not sure of another good option that would save embeds for users without the proper capability.

joshbetz commented 6 years ago

Maybe we could add this as a flag to this CLI?

The main thing I like about having two commands is that we can run the first one, then do some kind of verification (diff the generated HTML maybe?), and then run the table cleanup.

joshbetz commented 6 years ago

I think that's a valid concern. One idea I've had splashing around in my head was to convert these to a Gutenberg embed block, and offer proper editing that way? I'm not sure of another good option that would save embeds for users without the proper capability.

I think before we spend too much more time on converting these, we need to come up with a way to ensure old posts aren't going to break if they're updated by someone who doesn't have the unfiltered_html cap.

emrikol commented 6 years ago

I think before we spend too much more time on converting these, we need to come up with a way to ensure old posts aren't going to break if they're updated by someone who doesn't have the unfiltered_html cap.

Could we wrap the reversed embeds in a HTML comment or something, and capture it via the pre_kses filter? Just thinking out loud without testing yet.

joshbetz commented 5 years ago

Closing this per our call in favor of #23 and #24.