dotCMS / core

Headless/Hybrid Content Management System for Enterprises
http://dotcms.com
Other
846 stars 465 forks source link

Running Integrity Checker "fix" can break content #29059

Closed wezell closed 1 month ago

wezell commented 2 months ago

Parent Issue

No response

Problem Statement

If there is a content type ID mismatch between two environments and you run the integrity checker, we update the structure.inode and the contentlet.structure_inode in the structure and contentlet tables but we do not update the contentlet.contentlet_as_json['contentType']

This causes an error when dotCMS tries to load the content type from the information stored in the json - that content type id no longer exists. See here where we are trying to load the content type from the information in the json - kaboom! :

https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotcms/content/business/json/ContentletJsonAPIImpl.java#L289

We should alway be loading the content type from the column in the db - which should be seen as the source of truth. That said, there are a few options and maybe we need to do a few of them

  1. Pass in the preloaded content type (or the expected field list) into the to the ContentletJsonAPI method and basically ignoring the contentType value in the JSON
  2. Instead of storing the contentType Id in the json, we store the content type variable. Regardless if we fix #1, I think we should do this as our default as it is way less fragile and works out of the box by the way. If you run this on a piece of FileAsset content, the content will still load.
UPDATE contentlet SET contentlet_as_json['contentType'] = '"FileAsset"' WHERE inode = '6e51f36c-8555-497a-9a0d-51923be9a269';

Steps to Reproduce

  1. Set up a push publishing environment that has conflicts - empty starter vs the new starter for example
  2. Run the integrity checker. Make note of the content types that have conflicts and fix them at one of the ends.
  3. On the "fixed" instance, try to load one of the content of the "fixed" types.

To reproduce it quickly, just update a piece of content's json with a content type that does not exist, e.g.

UPDATE contentlet SET contentlet_as_json['contentType'] = '"NOT_FOUND_NOPE"' WHERE inode = '6e51f36c-8555-497a-9a0d-51923be9a269';

Acceptance Criteria

The content type in the database field should be primary - we should be able to update that column in the content type and not blow up the content. I would like to avoid a "fix" that makes the integrity checker run an update on all the contentlet_as_json values as on large datasets that could take a very long time.

If we do choose to store the contentType.variable in the json rather than the id, then I would say that could happen lazily

dotCMS Version

23.10

Maybe 23.01 - wherever we added support for the contentlet as json

Proposed Objective

Reliability

Proposed Priority

Priority 1 - Show Stopper

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

No response

Assumptions & Initiation Needs

No response

Quality Assurance Notes & Workarounds

No response

Sub-Tasks & Estimates

No response

fabrizzio-dotCMS commented 2 months ago

As suggested in the PR

  1. I tested hooking a local instance with demo site for Push-publish
  2. I created a CT named it abc123 and had the velocity_var_name renamed to Testing
  3. Created multiple pieces of content with several versions
  4. Created a CT on the receiver and named it Testing
  5. Then checked for integrity conflicts and an issue showed up
  6. Fix the issue locally and then verify as follows: 6.1. Content type ids were updated including the structure_inode field and the internal contentType on the contentlet_as_json field as well 6.2. All pieces of content and their versions were updated 6.3. All pieces of content can be edited saved and updated

Therefore I'm passing the issue

bryanboza commented 1 month ago

Fixed, tested following the provided steps in the last comment and I'm unable to reproduce the problem..

Tested on the latest trunk // Docker // FF