Heyuri / Uploader

2ちゃん uploader!
MIT License
1 stars 0 forks source link

Anti-spam and deduplication #4

Closed hachikuji-san closed 2 months ago

hachikuji-san commented 2 months ago

as far as i can tell there's no anti-spam methods like a captcha for Uploader, and Kaptcha is dead. As well as that there doesn't seem to be any deduplication. Duplicate images aren't a problem by themselves considering how many people use the uploader. However coupled with there be no captcha/anti-spam means a fairly simple script could flood up.heyuri.net with a ton of the same files.

The same thing apples to user board creation, which could become an ugly mess to clean up in theory.

Ideas:

Perhaps i've missed something but it's kind of a glaring issue... @kaguy4 has anyone exploited/spammed up before?

kaguy4 commented 2 months ago

it is a glaring issue and I believe we aren't spammed by a bots by pure luck anti-spam measures should be easily commentable, or better, configurable (ON if some value is 1, OFF if 0)

I think all ideas sound good in idea, but wouldn't the last one a) add some kind of hash to flatfile that will make it bloat quite faster than it would be, eventually make loading times increase unnecessarily b) very easy to pass since someone who is capable enough to run a spamming bot can probably also do something to do a minor change to an image. See how someone botspammed 711chan: https://www.711chan.net/b/1.html (R8 warning)

kaguy4 commented 2 months ago

another idea that could work with cooldown: hourly limit of maximum number of upload-able files (should be easily configurable)

kaguy4 commented 2 months ago

for user boards too, maybe there can be something like maximum number of creatable boards per day?

I'm not l33t, but perhaps hourly limits I mention could work like this: 1- there is a file called something like "limit.log" 2- the file is empty. when a new file is uploaded & a board is created, it records the unix timestamp there 3- when a new file is uploaded/board is created, the script checks that file and 3a- first checks each line and removes unix timestamps that has expired (has been over an hour), removes their lines too so there wouldn't be empty lines. 3b- adds the current unix timestamp to the end of the file as a new line if number of lines in the flatfile is smaller than the configured limit 3c- if the number of lines in the file is equal or greater than the configured limit, it gives the user an error message like "Hourly file limit has been reached".

hachikuji-san commented 2 months ago

I think all ideas sound good in idea, but wouldn't the last one a) add some kind of hash to flatfile that will make it bloat quite faster than it would be, eventually make loading times increase unnecessarily

Instead of a flat file it could be stored in a nosql database/hashtable which would be really fast for a simple comparison check. . I'll look into it more later anyway.

b) very easy to pass since someone who is capable enough to run a spamming bot can probably also do something to do a minor change to an image.

The goal i guess would be to make it worth too much effort for a skiddie to do. There isn't a good way to completely predict how effective it will be in theory so putting into practice then refining to over time is the safest. Ah and there is more advanced ways of deduplication (i think Infinity-Next uses it for image banning) but that should be a last resort.

kaguy4 commented 2 months ago

I still think image hashes should be our last priority, and mixing databases with this software may not be a good idea. It could make things unnecessarily complicated, flatfiles work well enough in this age for small stuff.

For comparison, souko.log of Heyuri's warota.php is 331KB at 2762 files. Maths tell for it to reach 1GB, 1024^2/(331 / 2762) = 8.749.748 files would need to get uploaded. By the time we reach there, I'll probably have Heyuri's host switched to some quantum computar anyways

That said, I dunno how nosql works at all

kaguy4 commented 2 months ago

2 b clear: I don't think it's easily feasible, but I'm not opposed to ur idea either

hachikuji-san commented 2 months ago

Fair enough reasoning. I won't try implementing deduplication any time soon

hachikuji-san commented 2 months ago

since Uploader doesn't log IPs it kind of limits the kind of anti-flood scripts that can be used. I have 2 ideas.

kaguy4 commented 2 months ago

The $cooldown setting sounds good to me, it would at least provide some kind of protection against skiddies which the software seriously lacks currently I also agree cookie-based limiting isn't a good idea

hachikuji-san commented 2 months ago

im gonna close this since it's effectively solved with antiflood module