Automattic / newspack-custom-content-migrator

Custom migration tasks for launching and migrating Newspack sites on Atomic
5 stars 5 forks source link

OTW: Convert Documentcloud HTML embeds to Shortcode #58

Closed philipjohn closed 3 years ago

philipjohn commented 4 years ago

There are a bunch of posts on OTW where Documentcloud embeds have been inserted via HTML. Here's an example (pre-block conversion):

<!-- wp:html -->
<div id="DV-viewer-6563583-50129896-1" class="DC-embed DC-embed-document DV-container"></div>
<script src="//assets.documentcloud.org/viewer/loader.js"></script>
<script>
  DV.load("https://www.documentcloud.org/documents/6563583-50129896-1.js", {
  width: 400,
    height: 600,
    sidebar: false,
    text: false,
    container: "#DV-viewer-6563583-50129896-1"
  });
</script>
<noscript>
  <a href="https://assets.documentcloud.org/documents/6563583/50129896-1.pdf">HTP Apprenticeship College Ofsted Report Nov 2019 (PDF)</a>
  <br />
  <a href="https://assets.documentcloud.org/documents/6563583/50129896-1.txt">HTP Apprenticeship College Ofsted Report Nov 2019 (Text)</a>
</noscript>
<!-- /wp:html -->

The Documentcloud plugin provides a simple shortcode that works well for embedding these instead, and works with AMP on. The above HTML blocks become:

<!-- wp:shortcode -->
[documentcloud url="https://www.documentcloud.org/documents/6563583-50129896-1.html"]
<!-- /wp:shortcode -->

Note how the document ID (6563583) and file name (50129896-1) are combined in the two different methods. The migration tool will need to take this into account.

philipjohn commented 3 years ago

Some of the embeds don't have the noscript portion and look like this:

<div id="DV-viewer-288459-isle-of-wight-festival-council-10-year-deal-2009" class="DV-container"></div>
<script src="http://s3.documentcloud.org/viewer/loader.js"></script>
<script>
  DV.load('http://www.documentcloud.org/documents/288459-isle-of-wight-festival-council-10-year-deal-2009.js', {
    width: 425,
    height: 600,
    sidebar: false,
    text: false,
    pdf: false,
    container: "#DV-viewer-288459-isle-of-wight-festival-council-10-year-deal-2009"
  });
</script>
jeffersonrabb commented 3 years ago

Washington City Paper has a similar Document Cloud problem, in posts like this: https://washingtoncitypaper.com/article/177716/fava-pot-brings-healthy-hearty-egyptian-street-food-to-union-market-on-friday/

This code is in an HTML block:

<div class="DC-embed DC-embed-page DC-embed-enhanced" data-version="1.1" data-resource-type="page">
<div id="DC-6541418-Fava-Pot-Egyptian-Street-Food-p1-i1" class="DC-embed-view DC-embed-inline">
<div class="DC-page-embed DC-mode-image">
<div class="DC-meta">
<div class="DC-title">
<p>      <a class="DC-embed-resource" href="https://www.documentcloud.org/documents/6541418-Fava-Pot-Egyptian-Street-Food.html#document/p1" title="View entire Fava Pot Egyptian Street Food with DocumentCloud in new window or tab" target="_blank" rel="noopener noreferrer">Fava Pot Egyptian Street Food</a>
    </p></div>
<p>    <a class="DC-resource-logomark-link" href="https://www.documentcloud.org/documents/6541418-Fava-Pot-Egyptian-Street-Food.html#document/p1" title="View entire Fava Pot Egyptian Street Food with DocumentCloud in new window or tab" target="_blank" rel="noopener noreferrer"><br>
      <span class="DC-resource-logomark">DocumentCloud</span><br>
    </a>
  </p></div>
<div class="DC-page"></div>
<div class="DC-credit">
    Contributed to<br>
<a href="https://www.documentcloud.org/" title="Go to DocumentCloud in new window or tab" target="_blank" class="DC-logotype-link" rel="noopener noreferrer">DocumentCloud</a> by<p></p>
<p>  <a href="https://www.documentcloud.org/public/search/Account:16328-laura-hayes" title="View documents contributed to DocumentCloud by Laura Hayes in a new window or tab" target="_blank" rel="noopener noreferrer">Laura Hayes</a> of</p>
<p>  <a href="https://www.documentcloud.org/public/search/Group:washingtoncitypaper" title="View documents contributed to DocumentCloud by Washington City Paper in a new window or tab" target="_blank" rel="noopener noreferrer">Washington City Paper</a></p>
<p>•<br>
    <a href="https://www.documentcloud.org/documents/6541418-Fava-Pot-Egyptian-Street-Food.html#document/p1" title="View entire Fava Pot Egyptian Street Food in new window or tab" target="_blank" rel="noopener noreferrer">View document</a></p></div>
</div>
</div>
</div>

Some discussion here: https://newspack-pub.slack.com/archives/G014MS8N5A5/p1603477809051500

philipjohn commented 3 years ago

Good to know. There are at least a couple of different ways DocumentCloud is embedded on OTW so taking more into account should be fine.