Closed The-RedWizard closed 2 months ago
This would be a great opportunity for someone to build a proper rust crate to do this, that could be used by many projects.
I created a lemmy post about this: https://lemmy.ml/post/18162485
I wrote something in the last couple days.
https://crates.io/crates/clearurls
Let me know if if fits your need or you need anything else. Issues and PRs welcome.
One issue that the ClearUrls rules may not cover are the links that exist in a rainbow-table and are obfuscated by default, such as reddit.com/r/sub/s/gibberish and vm|vt.tiktok.com/gibberish links. The easiest way to implement a fix for this is to obviously just open the URL first and see where it redirects to. The issue then becomes is who will be opening the URL? Is it the instance? If it's the instance, then there would be a very clear pattern and signal to these companies that there exists a network of users there because there is one consistent IP/group of IPs deobfuscating every single rainbow-table link on lemmy.
Just want to clarify that a 90%-there solution is better than no solution. It would be acceptable even if the aforementioned problem still exists.
The solution I used for the bot (from the thread this was linked from) is to just open the URLs, but the bot is hosted from the IP range of a major VPN provider so I hope that the organic traffic from the VPN users would disrupt any graph that companies would build.
@jendrikw We'd be able to use the crate, but are dependent on https://github.com/jendrikw/clearurls/issues/3 , since our comments often contain links that also need to be stripped of tracking.
Requirements
Is your proposal related to a problem?
Currently, Lemmy will attempt to clean the URLs based on its own rules, instead I think it would be great if we could adopt the crowd soured rules created for the CleanURL extension. Considering Lemmy has such a large user base with a vested interest in scrubbing URLs from their respective platforms, we could contribute back to the CleanURL ruleset in a large way.
Describe the solution you'd like.
Per this thread conversation: https://hexbear.net/comment/5136579 I've created a pull request that adds the Rules repo under
Modules\Rules
as a submodule to the Lemmy repo. This could either be implemented as a default functionality, or an optional functionality for Lemmy Admins. Ultimately, I think it makes good sense to not reinvent the wheel when it comes to URL sanitization.Describe alternatives you've considered.
Initially, I thought of solving this issue via a bot that either DMs the OP of a post or comments within the post containing the sanitized URL, but since there is already some level of sanitation happening, it feels right to put this directly within the backend.
Additional context
This is my first real pull request, so please let me know if I'm not following proper procedure, or if I misunderstood the conversation within that thread.