ether / etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.
http://docs.etherpad.org/
Apache License 2.0
16.14k stars 2.8k forks source link

Limit number of versions of a pad or delete them in Pad Settings #6194

Open FredMa01 opened 4 months ago

FredMa01 commented 4 months ago

I don't see the possibility of limiting the number of versions of a pad or cleaning up old versions. The database grows without limit over time and exporting pads becomes long and heavy with numerous revisions . Would it be possible to have this revision deletion option in Pad Settings?

Fred

peter-rs commented 4 months ago

To add on to this, it should be possible to disable the revisions feature entirely (at least for read only pads), either via a plugin or through etherpad settings. For documents that have been published using a read-only link, anyone can go back and view the author's writing process which results in: Best case (a few embarrassing spelling mistakes) or Worst Case (personally identifying information, sensitive content)

q2apro commented 3 months ago

I absolutely need this feature too ... or at least "delete revisions older than 30 days" or alike.

I used Etherpad (via Docker) for two months now. About hundred users.

The postgresql database is zipped 400 MB now (!) This is insane.

  1. We require an option to clean old revisions older than 30 days.
  2. Optional an option to keep the last x revisions.
  3. Definitely an option to disable revisioning.

Point 3 is actually the most helpful to save storage.


Usage of my Etherpad: Statistics: Pads Text Size: 611.475 - Total Pads: 971

611 MB / 971 pads = 0.63 MB per pad

At least half of them are empty, so it's probably 1 - 2 MB per pad in average.

And those pads contain only 100 lines of text.


BTW, Size: 611.475 ... I guess it means KB.

q2apro commented 3 months ago

In each revision there is an option:

image

Does this disable the revisions for this pad?

If so, this option we would need globally. Disable all revisions for all pads.

SamTV12345 commented 3 months ago

The next version of Etherpad will feature a built in way to manage pads and also delete them. You can also sort by revision number and can then delete pads with a lot of revision numbers.

image

q2apro commented 3 months ago

That is nice to see, however, admins do not want to manually spend 20 - 30 min a day to go through all pads... also considering data privacy/protection, this would not even be allowed.

Again, we need:

  1. option so revisions older than 30 days are removed
  2. option to keep the last x revisions
  3. option to disable revisioning

Disabling revisioning is the most important one.

Buesra24 commented 4 weeks ago

Highly agree with this issue.

With privacy laws being quite strict in Germany, our university's data privacy officer called out the fact that it's actually quite problematic that we dont regularly delete our pads. Currently, author names of pads are stored indefinitely since they are all over the revisions, and without a way to delete those, we now have to set all pads to self destruct after 2 years of not being changed (which is really annoying for the kinds of pads people keep revisiting for information but don't edit).

Even if not automated, a simple solution to let us mass-delete revisions manually would help immensely. It might also speed up the process of getting the system up and running again after a shut down for maintenance.

Gared commented 3 weeks ago

I played a little bit around how this could be achieved: We can reuse the method/API copyPadWithoutHistory which generates a new pad with only 1 revision and the latest pad content (without any chat messages), but this will keep the author information. Would this be helpful if we build a plugin that is running this function on pads that have not been touched in x days?

SamTV12345 commented 3 weeks ago

I played a little bit around how this could be achieved:

We can reuse the method/API copyPadWithoutHistory which generates a new pad with only 1 revision and the latest pad content (without any chat messages), but this will keep the author information.

Would this be helpful if we build a plugin that is running this function on pads that have not been touched in x days?

That sounds great. I think that's a good idea. I don't know if it would be a plugin or core. Essentially anybody has the problem that revisions aren't deleted.

Buesra24 commented 3 weeks ago

but this will keep the author information

Does that mean ALL author information or just the last active author? If it is just the last one, I think that might be fine enough (though does this method also get rid of all author colors?), so ultimately a really helpful solution.

Gared commented 3 weeks ago

but this will keep the author information

Does that mean ALL author information or just the last active author? If it is just the last one, I think that might be fine enough (though does this method also get rid of all author colors?), so ultimately a really helpful solution.

I think I need to clarify the storage of author information.

Every author has a global ID with his color and a name and related pads:

+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|key                            |value                                                                                                                                                                                               |
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|globalAuthor:a.laKubcKE8c8MXbgR|{"colorId":"#c7d5ff","name":"John Doe","timestamp":1717928115518,"padIDs":{"abcdefaertdfdf0.3967821042768165":1,"abcdefaertdfdf0.2496401404544648":1,"abcdef":1,"test":1,"gblub":1,"asdfasfsdaf":1}}|
+-------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

This global ID (a.laKubcKE8c8MXbgR) is referenced in the pad meta data:

+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|key     |value                                                                                                                                                                                                                                                   |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|pad:test|{"atext":{"text":"...\n","attribs":"..."},"pool":{"numToAttrib":{"0":["author","a.laKubcKE8c8MXbgR"],"1":["bold","true"],"2":["italic","true"],"3":["underline","true"]},"nextNum":4},"head":120,"chatHead":-1,"publicStatus":false,"savedRevisions":[]}|
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Additional to the pad meta data the author is also referenced in the according revision:

+----------------+----------------------------------------------------------------------------------------------------+
|key             |value                                                                                               |
+----------------+----------------------------------------------------------------------------------------------------+
|pad:test:revs:10|{"changeset":"Z:u>3|3=l=8*0+3$es ","meta":{"author":"a.laKubcKE8c8MXbgR","timestamp":1717845374599}}|
+----------------+----------------------------------------------------------------------------------------------------+

This means that:

and this results in this behaviour:

Buesra24 commented 3 weeks ago

and this results in this behaviour:

  • In the timeslider you will only see author information of authors that have text in the latest version

Hmm, okay, that wouldnt quite fix the issue for us... Is it possible to auto-delete global author information (or neutralize, eg change name to "inactive author" or sth) after a certain amount of time has passed since their last assotiated contribution timestamp? Bc that in combination with the other proposed method would work out practically all issues we have.

Or alternatively something that lets you auto-change the authors on any timestamp older than X to a global "dummy" author that is called "inactive author" or sth like that?