LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.22k stars 877 forks source link

Clear text of deleted posts/comments after 30 days #2977

Closed arsg0etia closed 1 year ago

arsg0etia commented 1 year ago

Hello Lemmy team,

I recently came across a Reddit post that raised some concerns about how Lemmy handles user data. According to the post, it seems that some users are worried that Lemmy doesn't care about their privacy and data.

I would like to know more about this issue and what steps, if any, are being taken to address these concerns. Could you please provide more information on this topic?

Thank you for your time and for all the hard work you do on this project.

Nutomic commented 1 year ago

Deleted comments remain on the server but hidden to non-admins, the username remains visible

Correct. This is mainly to allow reverting deletes. However if you delete your account, all your posts are overwritten. Additionally you can edit individual posts to overwrite the content, then the previous version will be gone. So its all quite similar to Reddit, except account deletion is more thorough.

Deleted account usernames remain visible too

True, is it really a privacy concern if your username is visible? Again, all posts are deleted permanently when you delete a Lemmy account.

Anything remains visible on federated servers!

This looks like a problem with kbin, maybe it doesnt support deletions. You should check with the developer. Between Lemmy instances deletions are federated. But being a distributed system, there is no guarantee that they can be delivered successfully (the same goes for all other actions).

When you delete your account, media does not get deleted on any server

Media is only stored on the server where you signed up, Lemmy doesnt mirror it. When deleting an account, all data is purged including posts and media. This is actually not great as it means that popular memes or discussions just go away.

In general I dont entirely understand why you are so concerned about these things. You are posting on a public website, and you can expect that it will be indexed and cached by many different tools like Google or archive.org. Its the same as on Reddit, once you post something publicly on the internet there is no way to undo that.

dessalines commented 1 year ago

Anyone could easily verify account deletion in the code also. https://github.com/LemmyNet/lemmy/blob/8cb5939f5048c3eab293884923d4c3d5fcc08e2f/crates/api_common/src/utils.rs#L716

The other concerns are common to any federated service. We federate deletes, but ultimately there is no "unsend email" button, and once data gets sent to other instances and fediverse services, we don't have control over it.

Also, the person who posted that thread, is a known troll, ideologically opposed to the fediverse, and also not a developer. It appears they're using lemmy's increase in popularity to try to drive people into their walled garden site, and away from the fediverse.

jherazob commented 1 year ago

Can you add a way to disable the undelete of messages on an account, or even server-wide? Many of us fled to the Fediverse in part because of the prevalent surveilance everywhere, and knowing that the messages you delete are not really deleted is problematic, that kind of behavior is not supposed to happen here. Even from a resources point of view, keeping a lot of deleted data permanently is not good. Plus i don't even know if the GDPR allows this, wouldn't be a good idea to risk that.

Nothing4You commented 1 year ago

a reasonable approach for this is probably to purge deleted items at regular intervals, e.g. allowing restoration within 15 minutes, afterwards it's permanently deleted.

Brouware commented 1 year ago

Good to know that overwriting helps to permanently delete a message, but I think there should be an option to leave no trace of a message. Obviously there is no guarantee on the internet it's truly gone, but it would be better than leaving the username in my opinion. Even Reddit does that.

Kommynct commented 1 year ago

We should take the tildes approach and delete it after 30 days, I think.

Nothing4You commented 1 year ago

just to add another example, matrix/synapse allows the server admin to set a retention period for redacted content: https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#redaction_retention_period

jherazob commented 1 year ago

Been checking, seems like unlimited time deleted message retention DOES go against article 17 of the GDPR, you're not supposed to keep deleted user data for unlimited amounts of time beyond what's required for "reasonable purposes". I doubt it'd come as far as that since Fedi is just a bunch of small servers, most likely just a bunch of warnings if that, but worst case scenario this could expose every single Lemmy instance admin to legal GDPR liability, and it's very much for no real reasonable reason. Why deleted messages are kept for essentially unlimited time wasting Postgres space is honestly baffling to me. Keep them for a bit, yeah, but for unlimited time? Just put a retention period on the message then fully delete it when it passes, doing it as it's been done today is not good at all.

Nutomic commented 1 year ago

Does GDPR apply to Lemmy? My understanding is that it applies to companies over a certain size, but then again Im not a lawyer.

jherazob commented 1 year ago

It applies to anything offering services to European citizens, in and out of the EU, commercial or not, as long as it processes personal data. An entity like a Lemmy instance qualifies, as at it's core it deals with personal information and data. I'm not a lawyer either, just a sysadmin that has had to deal with this before, but keeping deleted personal data for an unlimited time is likely enough to get people in trouble with the EU, all it takes is a single complaint to their country's regulator, and you know it'll happen. And the fines start at 20 million euros if i recall (it's been a couple years, don't recall the particulars) if the violation is determined to be true, and go up from there. So yeah, exposing every single Lemmy admin to this kind of liability for something that long term wastes database space? As a sysadmin it just feels plain weird the more I've thought about it the last couple of days. You want to keep the data for message undeletion? Do it for a bit, 7 days, 30 days something like that, but not FOREVER. It just makes less and less sense the more i've thought about it. Just add a deletion timestamp, and a cron job that deletes anything beyond $RetentionPeriod or something like that. Keeping deleted data too long makes no flipping sense even from an admin perspective.

httpjamesm commented 1 year ago

I understand the rationale behind retaining user content, but I do think it should be more apparent to the user. Another warning should be added describing how federation does not guarantee propagation of content deletion or modification.

Nutomic commented 1 year ago

Clearing the text of deleted posts/comments after 30 days or so sounds reasonable. You can make a pull request to handle this in scheduled_tasks.rs. You can also edit the documentation to clarify this.

ghost commented 1 year ago

I've noticed a feature on Akkoma that I think would be beneficial for Lemmy to implement. On Akkoma, users can choose whether their content should be deleted and specify the number of days until deletion. However, I think this feature should only apply to "deleted" posts, not every post. The current issue on Lemmy is that when content is "deleted," it is hidden from everyone, including the user who posted it. Additionally, the trash can icon used for this action can mislead users into thinking they are permanently deleting their content when they are only hiding it. This issue seems to be related to problems discussed in these GitHub issues: 2624 and 2410.

To address this problem, I suggest making it clear to users that their content is hidden rather than permanently deleted. One possible solution could be implementing a bot, similar to PowerSuiteDelete for Reddit, that automatically edits "deleted" content older than a certain number of days. However, I don't believe this feature is necessary for the main program.

The proposed change to clear the text of deleted posts/comments after 30 days introduces another side effect to the already complex delete process. While the intention is to address privacy concerns, it's important to consider the user experience and their expectations when interacting with the platform. If an action is going to be performed behind the scenes, it would be beneficial to inform the user about it and, if possible, provide them with the option to choose whether or not the action takes place. This way, users can have more control over their data and understand the implications of their actions on the platform.

Screenshot from 2023-06-24 22-42-40