Automattic / newspack-custom-content-migrator

Custom migration tasks for launching and migrating Newspack sites on Atomic
5 stars 5 forks source link

feat(ContentDiff): migrate all taxonomies #516

Closed kariae closed 1 month ago

kariae commented 3 months ago

How to test


eddiesshop commented 3 months ago

This is super great Zak! Thanks for getting the ball rolling on this!

My only concern is that I think the author taxonomy is too much of a special case to migrate using this mechanism, because certain author taxonomy rows correspond directly to a wp_posts record (usually found via wp_term_relationships, although sometimes, somehow, that data gets lost for sites). I'm afraid that if we migrate these author taxonomy rows as such, without ensuring that the old corresponding wp_posts record exists first, and without updating the wp_term_taxonomy.description field after migrated, that we'll lose some important context around that data, and which would create misleading confusion if we ever needed to update/fix those records should any issue arise.

kariae commented 3 months ago

without ensuring that the old corresponding wp_posts record exists first

The command that the Knife tool ran before this one is content-diff-search-new-content-on-live which accepts post-types to migrate as a parameter, so there we need to set guest-author as a post type we need to migrate. Maybe we can add some logic in the knife tool to ensure that if we're migrating the author custom taxonomy, we've set guest-author on the previous command, but that's out of scope for this PR.

without updating the wp_term_taxonomy.description field after migrated

Mhmm, actually we don't migrate the wp_term_taxonomy.description but I can add that.

kariae commented 3 months ago

@eddiesshop 4396f96 migrate the term description when it's added, and d002704 update the description and the count when the term already exists. I did some tests and it looks fine, can you give it a shot, please?

eddiesshop commented 1 month ago

I just finished testing this, and I think perhaps we should prevent the author taxonomy (and guest-author post-type) from being migrated. It is just too much of a special case.

From my testing, the updates in this PR will migrate the CPT guest-author and the custom taxonomy author, however it also does the following:

  1. when the author taxonomies are migrated, for some reason their slugs are altered (cap-eddie-carrasco => eddie-carrasco)
  2. None of the associations to stories the guest author is tied to are recreated.
  3. The description column isn't updated during the migration, and this contains some useful metadata about which record the guest author is ultimately linked to. If left alone, as these GA's are used to assign to stories, they will get their description updated by the CAP plugin (see here and here). Then over time, what will happen is this data will have some info that is updated, and some that isn't. This might cause headaches if we need to debug something with this data should any problems arise, because we wouldn't be able to trust it completely (and I think no log record like we do for posts [old id => new id relationship]).
eddiesshop commented 1 month ago

While the previous test I did on this PR was focused on CAP, I re-did the testing focused on a custom taxonomy.

I created a custom taxonomy on the existing using register_taxonomy('custom_taxonomy_for_testing', 'post'). Then I inserted a new term using wp_insert_term('test', 'custom_taxonomy_for_testing'). From there, I pulled ~15 post ID randomly from the DB and inserted new relationship data for those posts, tying them to this newly created taxonomy.

Finally, I ran the content diff, making sure to specify the flag: --custom-taxonomies-csv=custom_taxonomy_for_testing.

I got the following error message:

Error occurred while inserting custom_taxonomy_for_testing 'test' live_term_id=44815 at live_post_ID=1148717 :Invalid taxonomy.

I think perhaps there's a step missing where we need to use the register_taxonomy function on the DB where we're migrating this data to, if that taxonomy doesn't already exist?

Something else I noticed was that this error only appeared for 2 posts that were caught as being new (i.e. having to be content-diffed to the DB). I think this means that any posts that were already migrated, but which may or may not have had new taxonomies associated to them, aren't being picked up.

iuravic commented 1 month ago

Hi @eddiesshop, this now ready for a test run. I'm thinking since taxonomies are an important part of data that it would be great give it another actual run with test data samples, please 🙏

Also sharing that I have not yet improved the fact that a post which has had a change in taxonomy is not recognized and picked up as a modified post. Thanks for detecting that, it's an important find, and I'd like to improve that in a separate PR right after we finish this one.

Changes I made:

iuravic commented 1 month ago

Created https://github.com/Automattic/newspack-custom-content-migrator/pull/542 on top of this branch. Therefore, this branch needs to be merged first, and only then the branch from #542 feat/migrate-all-wpusers.

iuravic commented 1 month ago

Thanks for the great catch, @eddiesshop , added prevention of guest-author CPT in 3fb1ade. Now merging!