Automattic / newspack-custom-content-migrator

Custom migration tasks for launching and migrating Newspack sites on Atomic
5 stars 5 forks source link

The Emancipator #320

Closed iuravic closed 1 year ago

iuravic commented 1 year ago

The data for this migration is already in a WP database. All the stuff I run in this code is looking at the large serialized array in post_meta (with the key api_content_element).

Here is what goes on:

I also added a few functions to the logic in this repo. Most notably I added a generator function get_all_wp_posts that should be a little less agressive on memory than using get_posts.

How to test

(I've just been running on local)

# Empty the trash
wp post delete $( wp post list --post_status=trash --type=post_type --format=ids )

wp newspack-content-migrator emancipator-taxonomy
# wp newspack-content-migrator emancipator-authors
wp newspack-content-migrator emancipator-bylines
wp newspack-content-migrator emancipator-post-subtitles
# wp newspack-content-migrator emancipator-redirects

There are two commands commented out – those I still need some input on.

I'm not sure how much/little to log, so it's a bit all over the place :)

naxoc commented 1 year ago

OK, I've been playing around with the co-authors and have a function that massages them into place. It still needs some work, so I'll continue on Monday.

naxoc commented 1 year ago

I've been playing around with getting more data. We have three commands so far for this migration". They all use data from the big serialized api_content_element in the post metadata.

wp newspack-content-migrator emancipator-authors
wp newspack-content-migrator emancipator-redirects
wp newspack-content-migrator emancipator-post-subtitles
  1. The first one gets author(s) and adds them as co-authors. I'm not sure where the post authors in the dataset come from, but they have emails something like AmberPayneandDeborahDouglas@bostonglobe.com for instance. I found a field that seems to have the original owner/poster of the article with an email sorta like deborah.douglas@globe.com. We could use those as the user? They are in $api_content_element['revision']['user_id'].
  2. A bunch of articles on the live sites are redirects. I've migrated them to entries with John Godley's redirect plugin. Not sure if that's what we want, but it's a shot :)
  3. The post subtitles are added from the $api_content_element['subheadlines']['basic'] field.

There are still TODOs in the code

naxoc commented 1 year ago

@iuravic I can't add you as the reviewer since you originally posted this one, but it's ready for review :)

naxoc commented 1 year ago

This ran on a staging clone. I'll merge so future changes are easier to keep track of.