Automattic / newspack-custom-content-migrator

Custom migration tasks for launching and migrating Newspack sites on Atomic
5 stars 5 forks source link

Feat/split terms #267

Closed eddiesshop closed 1 year ago

eddiesshop commented 1 year ago

This command will help identify duplicate slugs on any WordPress site. It then provides an interactive experience for handling those duplicate slugs, allowing for the user to selectively choose which slugs should be handled by the command.

The way the command "handles" the duplicate slugs is in 2 ways:

  1. There are the same number of Term and Taxonomy IDs, so the first Term ID remains as the original slug, and subsequent Term slugs are suffixed with -1, -2, -3, etc.
  2. There are more than one Taxonomy IDs for the same Term ID. In this case, new Term IDs are created with uniquely numbered slugs (suffixed -1, -2, -3, etc) and then the Taxonomy ID is updated with the newly created Term ID.
iuravic commented 1 year ago

subsequent Term slugs are suffixed with -1, -2, -3, etc and new Term IDs are created with uniquely numbered slugs (suffixed -1, -2, -3, etc)

@eddiesshop , I wonder is there a need to change slugs for newly created terms?

I just did a test on a fresh blank WP page, created a new tag called "blue", and then a new category called "blue", image image

Both these "blue"s terms have been created as separate terms (thanks to new WP logic). And they use the same slugs "blue" and this seems to work fine for a "blue" tag and a "blue" cat. So should we be renaming slugs when splitting them, or can we leave them be the same, just like in this example?

iuravic commented 1 year ago

@eddiesshop I created a blank test WP site with these terms and taxonomies

wp_terms table

term_id name slug
2 blue blue
3 dog dog
4 vegetable vegetable
5 notsharedcat notsharedcat
6 blue blue
7 dog dog
8 vegetable vegetable
9 notsharedcustomtax notsharedcustomtax

wp_term_taxonomy table

term_taxonomy_id term_id taxonomy
1 1 category
2 2 category
3 3 category
4 4 category
5 5 category
6 6 post_tag
7 7 post_tag
8 8 post_tag
9 9 testcustomtax
10 8 testcustomtax
11 7 testcustomtax

And here's what the command is reporting: image

Could you please check if this is correct?

Seems to me that it should be reporting these duplicates:

I believe it shouldn't be reporting dog because these are two separate term_ids 3, 7 and each only used once in wp_term_taxonomy with term_taxonomy_ids 3, 11.

Other than this, I think it would be super useful if the table above would list actual term_ids and term_taxonomy_ids that are going to be split, because that would make for much easier debugging -- we could see straight from this table which terms and term_taxonomy_ids are going to be affected.

iuravic commented 1 year ago

@eddiesshop here's my test site's full DB dump with these terms, taxonomies and 5 different posts I assigned these to differently.

eddiesshop commented 1 year ago

@iuravic thanks for the detailed analysis you performed with this branch!

I think your comments make total sense, especially given the default behavior WordPress exhibits when creating these taxonomies.

I modified the query so that all taxonomies are now shown, including custom ones (previously it was limited to just categories and post tags). I also added another virtual column which displays the term_id -> term_taxonomy_id -> taxonomy relationship. I did it in the query so as to not have to do any PHP processing on the result set.

Let me know what you think about these changes! image

eddiesshop commented 1 year ago

I apologize for the DB errors! I didn't test this branch with custom taxonomy, and so that was causing errors which weren't expected and weren't handled properly. I believe the branch is now ready for further testing!

eddiesshop commented 1 year ago

Hey Ivan! Thanks for taking another look at the PR. Please see below for my responses to your concerns.

I'm finding one issue,

termmeta seems to be lost for newly split terms

Yes that would make sense since we are creating a brand new term with a unique slug. How would you like me to handle this? Should I just duplicate whatever wp_termmeta rows exist for the original term/slug to the new term/slug?

Still wondering about this question asked above,

Even though default WordPress behavior is to allow duplicate term-slugs for categories and post tags, it is still essential to de-dupe the slugs when they are linked across a category or post tag. Please see the following example for the reason why:

All taxonomies linked to slug = dog

SELECT 
    t.term_id, 
    t.name, 
    t.slug, 
    tt.term_taxonomy_id, 
    tt.taxonomy 
FROM wp_zyv6o2ibm4_terms t 
    INNER JOIN wp_zyv6o2ibm4_term_taxonomy tt ON t.term_id = tt.term_id 
WHERE t.slug = 'dog';

image

Selecting dog as a tag for post

image

WordPress selects the wrong taxonomical record when assigning post tag

SELECT 
       p.ID, 
       p.post_name, 
       tr.term_taxonomy_id, 
       tt.taxonomy, 
       t.term_id, 
       t.name, 
       t.slug 
FROM wp_zyv6o2ibm4_term_relationships tr
    LEFT JOIN wp_zyv6o2ibm4_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    INNER JOIN wp_zyv6o2ibm4_terms t ON t.term_id = tt.term_id
    INNER JOIN wp_zyv6o2ibm4_posts p ON tr.object_id = p.ID
WHERE p.ID = 20
ORDER BY p.ID, t.slug;

image

Because of this behavior, it is necessary to ensure that a 1-to-1 relationship is maintained between Term -> Slug -> and Taxonomy. This is where this PR comes into play.

After splitting the term-slug

image

image

SELECT 
       t.term_id, 
       t.name, 
       t.slug, 
       tt.term_taxonomy_id, 
       tt.taxonomy
FROM wp_zyv6o2ibm4_terms t
    INNER JOIN wp_zyv6o2ibm4_term_taxonomy tt ON t.term_id = tt.term_id
WHERE t.slug LIKE 'dog%';

image

SELECT
       p.ID,
       p.post_name,
       tr.term_taxonomy_id,
       tt.taxonomy,
       t.term_id,
       t.name,
       t.slug
FROM wp_zyv6o2ibm4_term_relationships tr
    LEFT JOIN wp_zyv6o2ibm4_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
    INNER JOIN wp_zyv6o2ibm4_terms t ON t.term_id = tt.term_id
    INNER JOIN wp_zyv6o2ibm4_posts p ON tr.object_id = p.ID
WHERE p.ID = 23
ORDER BY p.ID, t.slug;

image

Regarding term_id_count,

I'm not sure I understand term_id_counts meaning -- can you please share in words what term_id_count means? it seems to be always showing value 1. Is that supposed to be so?

We can remove this column now actually. Because we are now grouping slugs across term IDs and not taxonomies, the value for that column will always be 1.

I'd recommend that we now removed the second column because the third column now contains that same info,

I disagree with this because this column would provide a quick glance at the taxonomies involved with the operation. However, I will make a change that omits that column by default, but can be added with a flag.

I would also recommend that we changed the wording from "Splitting duplicate term slug..." to "Splitting term..." since the major change done is that terms are being split, while whether their slugs are preserved or changed seems secondary to me

Sure thing 👍

iuravic commented 1 year ago

Yes that would make sense since we are creating a brand new term with a unique slug. How would you like me to handle this? Should I just duplicate whatever wp_termmeta rows exist for the original term/slug to the new term/slug?

Yeah, I think so. Just duplicating all the termmetas for newly created terms should do the trick 👍

Even though default WordPress behavior is to allow duplicate term-slugs for categories and post tags, it is still essential to de-dupe the slugs when they are linked across a category or post tag. Please see the following example for the reason why ...

Thanks so much Super clear to me now! 🙇

I'd recommend that we now removed the second column because the third column now contains that same info, I disagree with this because this column would provide a quick glance at the taxonomies involved with the operation. However, I will make a change that omits that column by default, but can be added with a flag.

That's fine, if you find it useful, let's leave it.

eddiesshop commented 1 year ago

Hey @iuravic ! I committed a change to duplicate termmeta when a term is split (i.e. when a new term record is created). Please review the change when you get a chance!

iuravic commented 1 year ago

@eddiesshop , please run PHPCS as well since this is a reusable migrator, it's presently throwing errors. Make sure those are cleared, especially SQL param sanitization using $wpdb->prepare() for all statements.

You'll also find errors from previous commits. We should aim to clean the code we've previously worked on, plus if it's easy and quick to fix remaining errors, that would be appreciated too.