OpenSourcePolitics / decidim-module-cleaner

GNU Affero General Public License v3.0
4 stars 2 forks source link

Versioned personal data stored by PaperTrail left behind #11

Open ahukkanen opened 1 year ago

ahukkanen commented 1 year ago

Decidim stores versions of the user data through PaperTrail as part of the Decidim::Traceable module as defined here: https://github.com/decidim/decidim/blob/bfc862f2308c3215c52b324e16de8680ed64fe16/decidim-core/lib/decidim/traceable.rb#L19

The data is stored to the versions table in the database within the object_changes column in YML format e.g. as follows:

id:                                                                   
-                                                                     
- 221                                                                 
email:                                                                
- ''                                                                  
- testuser@example.org                   
encrypted_password:                                                   
- ''                                                                  
- "$2a$11$R4LAq...L690yl8Sy7VUw6vB.6"      
created_at:                                                           
-                                                                     
- !ruby/object:ActiveSupport::TimeWithZone                            
  utc: &1 2023-03-01 09:59:33.930443106 Z                             
  zone: &2 !ruby/object:ActiveSupport::TimeZone                       
    name: Etc/UTC
  time: 2023-03-01 09:59:33.930443106 Z
updated_at:
- 
- !ruby/object:ActiveSupport::TimeWithZone
  utc: *1
  zone: *2
  time: 2023-03-01 09:59:33.930443106 Z
decidim_organization_id:
- 
- 1
confirmed_at:
- 
- !ruby/object:ActiveSupport::TimeWithZone
  utc: 2023-03-01 09:59:33.830071364 Z
  zone: *2
  time: 2023-03-01 09:59:33.830071364 Z
name:
- 
- Lisabeth Schiller 4 4 endr4
nickname:
- ''
- coleman_gleichner
type:
- 
- Decidim::User

You can find this data e.g. with the following command from the rails console:

Decidim::User.all.sample.versions[0].object_changes

I think this data should be also cleared up for deleted users after a certain period of time as it contains personal details especially when applied to the user related models.

Note that this data can be sometimes useful to trace back the changes in the user model, e.g. in case we are accidentally deleting some account or in case we need to investigate some issue with the account.

I would suggest that there would be a defined (preferrably configurable) "cutoff" period after which the versioned user data would be also deleted for deleted accounts.

Note that this same issue also applies for the Decidim::Authorization model which also holds personal data. Those records can be already deleted by admins from the admin panel but the versions table is not currently cleaned after the removal. A similar "cutoff" period should also apply to the versioned authorization data.

To fetch the version data for deleted user accounts:

PaperTrail::Version.joins(
  <<~SQL.squish
    INNER JOIN decidim_users ON decidim_users.id = versions.item_id
      AND versions.item_type IN ('Decidim::User', 'Decidim::UserBaseEntity')
  SQL
).where.not(decidim_users: { deleted_at: nil })

To fetch the version data for deleted authorizations:

PaperTrail::Version.joins(
  <<~SQL.squish
    LEFT JOIN decidim_authorizations ON decidim_authorizations.id = versions.item_id
      AND versions.item_type = 'Decidim::Authorization'
  SQL
).where(item_type: "Decidim::Authorization", decidim_authorizations: { id: nil })
Quentinchampenois commented 1 year ago

Thank you for this point, resources and examples !

Task should include versioned user data and authorizations as described, and check if omniauth identities are also cleared after predefined (and also configurable) period

Quentinchampenois commented 1 year ago

Hello @ahukkanen,

The account destroy is based on Decidim::DestroyAccount ( see )

We can implement the clear of versioned user data and authorizations into this module. I wonder if it can be interesting for the community to have it directly in decidim-core in Decidim::DestroyAccount. If so, we can make a contribution, what do you think ?

ahukkanen commented 1 year ago

@Quentinchampenois IMO we should have some kind of retention period (i.e. the "cutoff" period I mentioned) before the versioned data is wiped out.

I would suggest that this would be configurable and the defaults could be e.g.

I would suggest that this period (1 month) would be configurable from the module's config_accessors.

So I would not wipe out the data straight when the user account is deleted.

Eventually, I think this whole module should be in the core but core is in feature freeze right now, so it won't happen straight away.

Quentinchampenois commented 1 year ago

Thanks for your answer it is good to me, we will implement it into this module