bcgov / MFIN-Data-Catalogue

The Finance Data Catalogue enables users to discover data holdings at the BC Ministry of Finance and offers information and functionality that benefits consumers of data for business purposes. The product is built using Drupal and adheres to the Government of BC's Core Administrative and Descriptive etadata Standard.
Other
6 stars 0 forks source link

Empty pages in MR revision table/grid pagination #547

Open david-fong-bc opened 1 month ago

david-fong-bc commented 1 month ago

Describe the bug

Many revisions webpages have multiple leading empty table/grid pages (typically the MRs have few remaining revisions due to draft revision auto-pruning, so usually all but the last page are empty). The number of pages seems to correlate to time since the MR was created- with more time elapsed -> more pages. This happens in all environments, and happens for both unpublished and published MRs.

To reproduce

Here are some examples:

DEV node ID date draft published total revision page count
045 2024-02-08 5
047 2024-02-12 5
049 2024-02-15 5
058 2024-03-04 5
065 2024-03-04 5
066 2024-03-05 5
073 2024-04-19 4
109 2024-05-29 3
TEST node ID date draft published total revision page count
135 2024-05-24 3
136 2024-07-12 2
PROD node ID date draft published total revision page count
75 2024-02-08 5

Expected behaviour

There should not be any empty table/grid pages.

Additional context

I have a suspicion that this is related to draft revision auto-pruning, given that the page size seems to be ~50, and and the trend I see is that the number of pages is roughly the number of days elapsed since creation divided by 50. But that could just be a spurrious correlation.

danhgov commented 3 weeks ago

This one is puzzling. It might not be the content_moderation_delete module -- in the logs, you can see messages from it, and it sounds like it's been behaving correctly: https://dev.cat.data.fin.gov.bc.ca/admin/reports/dblog?type%5B%5D=cm_revision_delete

I'm querying the node_revision table, and am finding a list of 134 rows for node 109. Interestingly, the revision IDs that show up there are the same ones that are listed at https://dev.cat.data.fin.gov.bc.ca/admin/config/content/cm_revision_delete/devel if you set it to node 109.

There are rows in node_revision for all of these non-existent revisions, but there are no corresponding rows in the corresponding node_revision__FIELDNAME tables. It looks as though data has been added or removed incorrectly -- basically, there is some data corruption going on with our revisions.

Cron is running every 15 minutes, but the cm_revision_delete module is set to run "Everyday". So I switched that to "every time cron runs" and ran cron manually. Afterwards, there were still 134 rows for node 109 in the node_revision table -- with exactly the same data. So, it seems that cm_revision_delete is NOT creating a new revision per day on cron run.

The 134 revision rows all show the same timestamp -- and it is the time the draft was published.

I have run queries for most of the nodes that @david-fong-bc listed, and saved the results. My plan is to check in again in a day or two, to see if new rows have appeared, and if so, how many, and with ']]] ']what revision IDs.

Interesting note: The two maintainers of cm_moderation_delete work for (and one is the CTO of) Openplus -- the vendor who built this site for us.

danhgov commented 3 weeks ago

The only thing listed at https://dev.cat.data.fin.gov.bc.ca/admin/config/system/cron/jobs that runs once a day is the BC Data Catalogue Module's own cron-handler. In bc_dc_cron(), we see this:

 // Update every instance of field_review_status.
  \Drupal::service('bc_dc.update_review_status')->updateAll();

updateAll() does this: "Update field_review_status for all data_set nodes." For each data_set node, it determines if a review is needed or not, it sets the field_review_status field accordingly, and then saves the node.

This still doesn't explain why there are all those blank revisions, but it could explain why there is one revision per day...

But now I'm not sure this is it, either. I set that job to run every 5 minutes, and then ran cron manually. No extra empty revisions appeared -- at least for node 109, which I was monitoring.

I'll leave this for a day or two, and then come back to check. Here is the script I'm using on the Dev server:

TDIR=sites/default/files/dan.test.delete_after_oct_2024
NID=109
#edit the timestamp in the filename at the end of the next line before hitting 'enter'.
echo "select * from node_revision where nid=$NID order by vid;" | ../vendor/bin/drush sql:cli > $TDIR/n$NID.10-09.16h02