acquia / df

Demo Framework - mirrored at https://git.drupal.org/project/df.git
https://www.drupal.org/project/df
18 stars 19 forks source link

Prevent exporting of entities that shouldn't be in the CDF #169

Closed saltednut closed 4 years ago

saltednut commented 4 years ago

Although the old "whitelist" aka 'safe list' aka 'allowed' provided in #168 will prevent some things, we do wish, in some rare cases, to remove an entity. We should be able to add config that will let us cherry-pick out an entity from the CDF. This may lead to repercussions during installation, however.

For example, currently in DFS ONE, the entities are all created by user/1, we need to fix that by making a NEW content user that is unique to the scenario the author. We could create a super-user for "Ash" from the People page and make her the author of everything. Not only is this more realistic, but it allows us to separate the demo content packs more efficiently. In theory, this should be all that's needed to prevent user/1 from getting into the CDF.

That said, we NEVER want user/1 to go into the CDF, even if it means providing a potentially unstable export. This is for security reasons, of course.

So... long story short, we both have a practice change needed here AND we need a super special edge case "blacklist" or 'unsafe list' of specific entities. We wont always know the UUID's in some cases, but we do know the bundle and ID in certain cases, like with user/1.

Configuration to be added to scenarios.settings

disallowed:
  user:
    1: admin
saltednut commented 4 years ago

A few changes went in related to this architecturally a) We shouldn't expose an exclude list to config as its a very very limited list, so far just user/0 and user/1 we don't want to export. b) We do still see some instances of the admin showing up in dfs_one, however, any content entity that has an "owner" property has been updated to the new scenario specific user "AshRydell"

I'm realizing now that additional logging is needed when running the EventSubscriber so that we can see every time ownership is changed on an entity. This should help us to determine if there are still entities in the system that are considered linked to the admin. Further exploration into depcalc and its services might help.

saltednut commented 4 years ago

I am finding that user/1 comes through (account created 3 months ago) along with the menu links. This may not be the only case where there is no "owner" key in the entity's annotation schema, but there's still a depcalc dependency found.

I sorted this much by installing dfs_one, then changing settings to only export menu_link_content - I then exported just the menus and reinstalled the profile with only menu_link_content. The links show up and so does admin/1 from 3 months ago.

When I install with no CDF at all, the admin is from 1 minute ago (or today-ish/whatever, which is what it should be).

So it looks like we'll have to do more field handling in the EventSubscriber beyond swapping owners.

@EclipseGc sorry for the ping, but DYK if there's a way to look at an entity type like menu_link_item and see, using depcalc, why the admin is associated to it? It'd be nice to just throw a uid at a service and return everything that depends on that user, for example.

saltednut commented 4 years ago

block_content is another one like menu_link_content that does not have an "owner" key nor does it extend Drupal\user\EntityOwnerTrait the way that node "Content" and media "Media" entities do.

An export with node, block_content, and media showed the admin/1 user come through.

An export with just node and media also seems to attach admin/1 from 3 months ago, so it appears there's still something missing.

Given that even in a node/media only export (where user author and file entities are also added) we have to assume there's more going on re: depcalc and associations than what we're seeing so far.

The "owner" swap Kris and I came up with yesterday does work to change the author of the nodes and media though. Any entity type that extends the EntityOwnerTrait works. I can confirm that in both the node_field_data as well as the media_field_data the uid column is correctly set to 4 (AshRydell user) and not 1

I also looked in the database under file_managed and noticed the uid column is actually set to 4 there too, for every file thats added via the CDF import. I can confirm that "owner" is a key on the File entity too - so that should mean files are "safe" here.

Under the block_content_field_data there is no uid column, nor does block_content extend the EntityOwnerTrait - HOWEVER, there is a suspicious column called content_translation_uid where we do see 1 attached to blocks. This may or may not be affecting depcalc.

These are just notes to refer back to. There's no solution here yet, only various entities and the differences. The next step here is to get more granular and do further testing, including debugging the export event to determine step by step what is actually getting added to the CDF when we download it.

EclipseGc commented 4 years ago

ok so what we did yesterday is clearly not good enough. Looking at menu_link_content entities they have a revision_user field because they use the RevisionLogEntityTrait. There's likely to be a bunch of stuff like this. You've found one on blocks related to translations. All of these are entity references to users though, and we can use the same technique we came up with yesterday however, instead of checking for 'owner', we inspect the field type directly, check if it's an entity_reference field type referencing users. If it is, we just see if it's referencing user 1 and reference Ash instead.

saltednut commented 4 years ago

That seems to have done the trick, updated in scenarios module here: https://git.drupalcode.org/project/scenarios/commit/a6aa7c1

I still need to do some checking there to prevent wanted user entity references from being overwritten. I was thinking, now that we have the logger, we can create a safe-list of fields that can be changed out. So far, we see uid revision_user and content_translation_uid showing up in the logs. It might only need to be those three that get swapped by scenarios.

We do have to assume that in the future a content type that references user entities may exist where rewriting the reference to the demo user (Ash) is not the desired result upon export.