INN / umbrella-workdayminn

Umbrella repository for Workday Minnesota.
GNU General Public License v2.0
0 stars 1 forks source link

The problem of the missing authors #77

Open benlk opened 4 years ago

benlk commented 4 years ago

The db-based approach:


benlk commented 4 years ago

try the export thing that GS made

authorship_fixer.php runs against the database and, using lookup tables in the .php file, does the following:

  1. for posts with brenda@gravityswitch.com or team@advantagelabs.com as author:
    • if those posts existed in Drupal as recorded by the "_fgd2wp_old_node_id" post meta key value for the migrated node ID: find the earliest Drupal node revision corresponding to that post's node ID, find the Drupal author UID of that node revision.
    • if there is a WP User corresponding by email address to the Drupal author, update the WP post_author to tha WP User
    • if no WP User is found, print an error message
    • if no Drupal author UID is found in the stored data tables, print an error message
    • if no Drupal post is found in the lookup tables with a node ID that matches the "_fgd2wp_old_node_id" post meta key value for the node ID, the post is assigned to the WP user for "wdm@workdayminnesota.org"
  2. for all posts in the database, If the post has "_fgd2wp_old_node_id" post meta:
    • if the post has corresponding data from the lookup table of Drupal author entity ID to author name, set that author name as the "largo_custom_byline" meta
    • if not, check a different lookup table to see if there's an author profile associated with the post, and if it exists, set the "largo_custom_byline" meta to the name of the author
benlk commented 4 years ago

One point that seems risky to me:

if no Drupal post is found in the lookup tables with a node ID that matches the "_fgd2wp_old_node_id" post meta key value for the node ID, the post is assigned to the WP user for "wdm@workdayminnesota.org"

However, in the test run of this script on their site, the printed log returned no logs that matched the message that the code would output in this case where a post is assigned to a generic user.

benlk commented 4 years ago

This script does not chunk by posts; instead it runs against all -1 posts in the same page load, and subsequently times out on my computer somewhere around post 1964, when starting from ~5600.

However, subsequent page loads process newly-updated records faster, resulting in the script eventually completing.

There's a large amount of this:

Notice: Trying to get property 'ID' of non-object in /Users/blk/sites/workdayminn/authorship_fixer.php on line 6326

5638 WELL! no valid drupal post revisions found... assigning to generic user

This is the message output with the "risky" case described at https://github.com/INN/umbrella-workdayminn/issues/77#issuecomment-645583290

The non-object warning appears to be because that line is checking for $generic = get_user_by("email", "wdm@workdayminnesota.org"); which is returning no user on my local db

benlk commented 4 years ago

Yes, the "risky" case happens on the prod db, and yes, the wdm@workdayminnesota.org user does not exist in prod. (In general, good that it doesn't exist. For this script, bad.)

benlk commented 4 years ago

Posts triggering the "risky" case: 6300, 6298, 5716, 5694, 5689, 5690, 5687, 5688, and more than 1000 others according to Firefox's search of the log output.

Post 6300 has user 2 as author, the dread "tsuperadmin".

This "risky" case is appearing in Step 1 of https://github.com/INN/umbrella-workdayminn/issues/77#issuecomment-645582450, where the script is still trying to match Drupal authors to WP Users.

After running the script, post 6300 has user 0 as author. This is bad: it results in a byline reading "By "

When the script finishes running, post 6300 still has author 0, but interestingly for 6300, it also has no largo_custom_byline metadata set. The script log reads:

6300 NID:30946 ERROR! no byline data found. cause: no $authref[0]["field_author_reference_target_id"]

$ wp post meta list 6300
| post_id | meta_key            | meta_value                                                                      |
+---------+---------------------+---------------------------------------------------------------------------------+
| 6300    | _fgd2wp_old_node_id | 30946                                                                           |
| 6300    | _thumbnail_id       | 985                                                                             |
| 6300    | wpcf-media          | https://wdmdev.gravityswitch.com/wp-content/uploads/2015/05/labor_education.gif |
| 6300    | _encloseme          | 1                                                                               |

So, for some posts, this script will result in posts changing authorship from tsuperadmin to no author at all!

Next steps:

benlk commented 4 years ago

The missing-author-info case is known and accepted.

The fallback author will be tsuperadmin, with the user display name changed to "Workday Minnesota".

To do:

benlk commented 4 years ago

Sorry for the delay here! It looks like the script is looking for the wp-load.php file to be in the root. It is not in the root on our Flywheel Cloud Platform. However, there is a workaround for this. I added a wp-load.php file in the root which then links to the correct wp-load.php file location.

This should do the trick, but if you continue to have any issues, please let us know!

The support tech added this file on prod, not staging, but it's easy to copy from one to the other:

wp-load.php:

<?php require_once('.wordpress/wp-load.php'); ?>

However, Flywheel Staging does not print the raw output of the script as it works. It instead outputs a 504 Gateway Timeout error. We don't know how much progress is being made.

benlk commented 4 years ago

private repo for the revised authorship fixer scripts: https://github.com/INN/workday-author-fixer

benlk commented 4 years ago

a downside of timeout-based processing with Flywheels' Gateway timeout approach is that there's no way to get a log that covers all changes made.

In the final run of bad_author_fixer.php, which tried to find authors for posts by team@advantagelabs.com, the "admin" user:

In the final run of authorship_fixer.php:

In conclusion:

benlk commented 4 years ago

Production run notes:

In the final run of bad_author_fixer.php, which tried to find authors for posts by team@advantagelabs.com, the "admin" user:

In the final run of authorship_fixer.php:

In conclusion:

MirandaEcho commented 4 years ago

Thanks @benlk! How did this compare to the staging run?

benlk commented 4 years ago

Slightly fewer posts with a known Drupal author but no corresponding WordPress author; I think they might've set some WP authors manually on prod.

Same number of posts by the "Admin" user.

MirandaEcho commented 4 years ago

Thanks! Let's rename the author as you suggested, as we did with SFPP.

benlk commented 4 years ago

Author display name updated to "Workday Minnesota Staff"