Automattic / wp-calypso

The JavaScript and API powered WordPress.com
https://developer.wordpress.com
GNU General Public License v2.0
12.38k stars 1.97k forks source link

Migration Instructions: Site preview #91964

Open renatho opened 2 weeks ago

renatho commented 2 weeks ago

Details

The site preview is already implemented as part of https://github.com/Automattic/wp-calypso/issues/91776, but I noticed that some sites might not allow being rendered inside an iframe. One example is https://automattic.com/.

Was it already solved somewhere? Do we need a new solution? Would the WebPreview component help with that?

Checklist

Related

https://github.com/Automattic/dotcom-forge/issues/7703

donnapep commented 2 weeks ago

I suspect we'll need to do something like add a placeholder image if the iframe doesn't work. We'd need to work with design on that. We should avoid implementing any sort of complex solution for this at this point, especially given that we don't know how commonplace this scenario will be.

Imran92 commented 1 week ago

I looked into the issue and found a workaround that's not too hard to implement. Sites prevent loading them in iFrames by using headers. So we can just create a small proxy endpoint in between to call the site and rewrite/delete the header.

Here's a demo with a small proxy I created with Express locally in 5 minutes, you can see I've loaded https://automattic.com in an iFrame-

Image

There are two main problems with this-

  1. The solution is semi-small, probably not as small as we are looking for. The subset of the site we are dealing with can be too small to justify implementing something like this. Though we probably can't say for sure at this point, we can try adding a log each time we detect sites like these and decide a few weeks/months later.
  2. A possible big problem is, that if a site explicitly does not want to be loaded in an iFrame, there can probably be legal issues if we do so by some workarounds. Like the Terms and Conditions violation, CFAA (not sure) etc. This issue could be less severe if the end user only entered their own site's URL, but the user can be malicious and put any site's URL there and we can't verify the ownership.

Because of the two problems mentioned above, especially the second one which makes me think we should avoid trying to load such sites with any workarounds, I think we'd be better off finding a design solution that avoids loading such a site in an iFrame.

As for the potential solution, I see two ways to go about it.

  1. Adding a fallback in the iframe and showing something else (depending on what the Design team comes up with) or removing the section altogether when site loading fails in iFrame
  2. Try to detect early if the site has any anti-iFrame header set (maybe in the Site-Identification step, when we check if the site is eligible to migrate, we can also check the headers it returns) and set a flag and load the iFrame section in Migration Instructions accordingly based on the flag.

I prefer the second solution because we can avoid showing a failure message or a content-jumping after a few seconds (assuming the iFrame will take a couple of seconds trying to load the URL before failing).

WDYT? cc @donnapep @renatho

donnapep commented 1 week ago

As discussed in the Tycoon Leads call, we can use the mshots library to generate a screenshot and use that for all sites. This has the added bonus of the user not being able to click and navigate around inside of an iframe.