Automattic / wp-calypso

The JavaScript and API powered WordPress.com
https://developer.wordpress.com
GNU General Public License v2.0
12.4k stars 1.98k forks source link

Squarespace importer: images and galleries duplicated; link dropped from video embeds #78787

Open Nic-Sevic opened 1 year ago

Nic-Sevic commented 1 year ago

Quick summary

This relates to a specific user case but based on a test export I did and these two adjacent issues it seems like a general formatting issue: https://github.com/Automattic/wp-calypso/issues/50158 https://github.com/Automattic/wp-calypso/issues/50745

Summary: After importing the content, user had duplication of individual images and images in galleries. Additionally, the src links from video embeds were not brought in, leaving blank spaces.

The duplication of images appears to be caused by Squarespace's use of a noscript element followed by a thumbnail (which is maybe hidden by their platform styling?). Post import the noscript tags remain which should result in the element being hidden unless JS is off from what I understand but this is not happening.

<noscript><img src="{src url here}" alt="" /></noscript><img class="thumb-image" src="{src url here}" data-image="{src url here}" data-image-dimensions="1500x1875" data-image-focal-point="0.5,0.5" alt="" data-load="false" data-image-id="{id}" data-type="image" />

Removing the noscript portion resolves this duplication without breaking the blocks.

For the videos

the import ends up with something like this

<div class="intrinsic" style="max-width:100%">
<div class="embed-block-wrapper " style="padding-bottom:56.20609%;">
<div class="sqs-video-wrapper" data-provider-name="YouTube" data-html="

&lt;iframe src=&quot;//www.youtube.com/embed/3KorVOxjdt8?wmode=opaque&quot; height=&quot;480&quot; width=&quot;854&quot; scrolling=&quot;no&quot; frameborder=&quot;0&quot; allowfullscreen&gt;&lt;/iframe&gt;

">

</div>
</div>
<div  class="video-caption-wrapper">
<div class="video-caption">

which matches the source but is broken:

<div class="intrinsic" style="max-width:100%"><div class="embed-block-wrapper " ><div class="sqs-video-wrapper" data-provider-name="YouTube" data-html="<br/><br/><br/><br/><br/><br/>  <br/>  &lt;iframe src=&quot;//www.youtube.com/embed/3KorVOxjdt8?wmode=opaque&quot; height=&quot;480&quot; width=&quot;854&quot; scrolling=&quot;no&quot; frameborder=&quot;0&quot; allowfullscreen&gt;&lt;/iframe&gt;<br/><br/>"></div></div><div  class="video-caption-wrapper"><div class="video-caption">

Steps to reproduce

  1. create an export from a squarespace site with images, galleries, and video embeds (or use this onebin [#200269] )
  2. import to simple site
  3. check for image/gallery duplication
  4. see that video embed missing most data

What you expected to happen

Should import content and respect element settings or maybe clean up incompatible elements but leave links to assets for default embedding?

What actually happened

all items come in as classic block, images duplicated, videos either stripped out or imported in unworkable format

In trying to clean up the file for a specific user I found these regexs useful. Combined they remove the unuseful script around regular image blocks. They don't work for the galleries though.

/<div\sclass="[^"]"\sdata-test="image-block-inline-outer-wrapper"[\s\S]?<\/noscript>/g /

\s{11}
\s{55}\s{13}/g

Impact

Some (< 50%)

Available workarounds?

Yes, difficult to implement

Platform (Simple and/or Atomic)

Simple

Logs or notes

Specific user case is discussed here: p9F6qB-cCf-p2

zaerl commented 1 year ago

See p9F6qB-cCf-p2#comment-54018.

zaerl commented 1 year ago

Regarding the sqs-video-wrapper instead, it seems Squarespace does take the value of the html data attribute and inject it in the DOM. In this case, it's the default HTML fragment obtained from Youtube > Share.

Doing this kind of fine-tuning can be pretty challenging (in this case, time-consuming), and we don't know if they will use it in the future.

@vishnugopal, what do you think of my last two comments?

vishnugopal commented 1 year ago

I think we'll have to have special-case handling here @zaerl. I remember working on similar cases for the Blogger importer. I can't see any other way out.