Thunder33345 commented 5 years ago

Why bother

Right now we have 0 defenses against spamming, maybe this could be useful Also this is NOT meant to exclude the possible damage of forcing POW as some users are mobile based and will struggle and lack the capability to legitimately post while the spammer on a mid tier rig can easily blast through POW This is mostly a barrier of sort, not meant to be the only line of defense against spamming but to be used in combination with other tools..

(FYI: i used the word POW signing to refer to computing a hash that matches requirement) Possible examples:

POW each publish

This is a simple system where each time you want to publish the data also need to be POW backed to publish, actually it dosent stop spammer, they could choose to spam everything in one go.

POW for file size

There should be a "POW sign" this to unlock XYZ more space EDIT: also there should be ways to have diffrent POW requirements, between certs providers and anonymous certs #1812 Example in contents.json for users:

POW Sizegate signing:
0byte: /^0000.*$/ (4 zeros hash)
50mb: /^0000.*$/ (4 zeros hash)
100mb: /^00000.*$/ (5 zeros hash)
500mb: /^000000.*$/ (6 zeros hash)
1gb: /^0000000.*$/ (7 zeros hash)
*1gb: /^00000000.*$/ (8 zeros hash)

In order to even do your first post you need a 4 zero hash check(explained by the 0 byte). Each time your data size reaches a new "milestone" you also need to create POW with matching regex. The *1gb means each time you reach a new multiples of 1 gb you also must sign it, say you reached 3 gb, then you must sign it 3 times.

for example in the user's content.json

size sign:
0byte: (data that when combined will result in hash matching regex)

the hash match function would be something like hash(auth address+size+site address+data) size and site address may be optionally excluded by owner's will, so that a hash can be signed once and used at elsewhere, and allow users to precompute huge hashes only based on their auth address

Other ways?

Only other ways i can think of is signing for every individual "entity" which depends on application, like zerome and zerotalk each have their own individual entity It would sure be nice to be able to make each entity to require signing but i cant see a way without huge complication in the json structure of each file Yes it's possible to do it front ended rather then at the network level, sure it hides the post but still dosent negate the fact people can spam and waste other's disk space

HelloZeroNet commented 5 years ago

My problem with POW is still the same: I think it's not possible to use it to block spam attacks and at the same time don't kill the user experience: Someone with a decent GPU can do 100x more calculation as a mobile phone can do, so if we request 100 minutes of calulation from a mobile user (I don't think anyone would wait that long to use the network), then a good GPU can still do it in 1 minute.

Thunder33345 commented 5 years ago

yeah POW has some scaling issues we cant have something that's relational to the device power not sure what else but it's an idea especially for me having to keep asking admins for increase in size of zerotalk maybe something like doing a POW that takes very long to unlock 1GB while yes it wont be instant but again it's not meant to block people from joining but blocking them so they cant do it all in one go

trenta3 commented 5 years ago

We could adopt a memory-bounded hashing function to provide such guarantee of equalities between users.

The idea is that both ASIC and GPU have very little RAM on-board, smaller than that of the Level2 cache of many processors (about 2MB), so that picking an hashing function that has to use a couple of megabytes of RAM to compute will have similar computing times no matter with which devices it is computed.

sergei-bondarenko commented 5 years ago

I proposed a concept of decentralized CAPTCHA for fighting spam. You can read about this idea here: https://grez911.github.io/captcha.html Happy to listen your thoughts.

purplesyringa commented 5 years ago

@grez911 Just. Awesome. I will try to make proof of concept with your concept if you allow it.

Thunder33345 commented 5 years ago

@grez911 that's a really interesting idea i like how it needs 0 trusted node, one problem though how do we ensure everyone post "real chaptas" regarding step 4.5(solving captcha), is there anything enforcing users to go back to step 2? since we will be assuming they are attackers, they can just edit the code that forces user to go back to POW

also since the captcha isnt "related" to the process anyone could bulk fetch and precompute the chapta in time, all they do is try to use OCR and see if that hash matches as "hidden data" isnt really hidden from users who tries hard enough to attempt to view them

i do like the format and the infographics also maybe we can make a zerotalk thread or new issues for this? to not clutter here

purplesyringa commented 5 years ago

Let's assume that we don't calculate current_post_id - 1 - random_number, but we take the whole hash and take the remainder of dividing the hash by current_post_id. This means that the probability of choosing each a post is equal.

Say, if there are 65536 captchas (the least possible count), and the attacker wants to post the first message, he will have to solve a captcha.

Then, when he wants to post the second message, he can either get a random captcha (and he will have to solve it), or he will do heavy PoW just to make sure that random captcha number points to the 1st spam message (because he knows the origin of its hash). When he makes more messages, it will be easier to post the next one, though.

Let's calculate the probability. If the attacker has made n messages, and he wants to make another one. There is a total of 65536 + n messages. He will get other's message with probability of 65536 / (65536 + n). Now, let's pretend that he chooses m different hashes (i.e. m different captchas). The probablity that he won't have his own captcha is (65536 / (65536 + n)) ^ m, where ^ stands for exponentiation. We want this probability to be small (because we want to make the process automated) -- let's take, say, 0.1% = 0.001. Now solve the equation (65536 / (65536 + n)) ^ m = 0.001 <=> m = log(65536 / (65536 + n))(0.001) (where log(a)(b) means logarithm of base a) <=> m = ln(0.001)/ln(65536 / (65536 + n)) = -7/ln(65536 / (65536 + n)).

Now, let's calculate how much PoW you would have to do to generate, say, 100 spam messages. After integrating this function from 1 to 100 (I tried summing up m(1..100) with Python and got similar results -- so I'll use integrating as a faster function) we get about 2 million PoW operations.

To make hashing slower, let's run 1000 iterations of SHA512 instead of one iteration of SHA256, as recommended by the document.

According to StackOverflow, you can calculate about 100 million SHA-512 hashes per second. Since we're running 1000 iterations, this is 100 000 PoW operations per second. This means that posting 100 spam messages will take you 20 seconds (WolframAlpha: (integrate -7/ln(65536 / (65536 + x)) from 1 to 100) / 100000).

This may sounds like good result, but it isn't. If we want to generate 1000 spam messages, that becomes 32 seconds, not 200 seconds! (FYI: 1 000 000 messages = 88 seconds).

Uh-oh.

Let's run 100 000 iterations maybe? This means you get 1000 PoW ops/sec. This leads to half an hour to generate 100 messages, and 2.5 hours for 1 million messages.

The "1 million messages" case was calculated assuming that you can take captcha from, say, the very first post (i.e. there's no "hash from last 65536 posts" condition). If we assume that we have that condition, things get worse. (integrate -7/ln(1 - min(1, x / 65536)) from 1 to 1000000) / 1000 gives us 1.3 hours to make 1 million spam posts.

Er... What about 100 000 000 iterations? That gives us 1 PoW operation per second. So 23 days to post 100 messages... And 1 second to post with captcha! Seems like a good solution to me.

So, what I recommend is:

Run 100 000 000 iterations of SHA-512 instead of one iteration of SHA-256 and you'll be fine.

Thunder33345 commented 5 years ago

that's interesting i never know that's a possible way to attack, trying to get your own chaptas

i mean it as in, what if we assume everyone in the image board is dishonest and post fake unsolvable chaptas? or like people posting plain easily readable captcha

trenta3 commented 5 years ago

Some other considerations and questions about the idea, that by the way is really interesting since it seems to add far greater inequality (1:1000000) between real people and spammer by using something that humans can do and machines almost can't.

What if the user publish an image that does not contain the real solution to the hash, rendering it unguessable? This could prevent others from posting their own post, if many people are doing this.
Once a user solves the hash, does he publish the solution for everyone? If it is so, imachug attack could be much more effective by using the fact that each user that post a message gives a solution to a specific captcha.
How to defend from OCR? Note that even if they don't exactly guess all of the characters in the image, a reduced bruteforce in which only a single letter is changed with all possible replacements should solve a high percentage of CAPTCHA.

Considerations

The CAPTCHAs should be lenghty enought and have salt so that a simple bruteforce is unviable (without salt and CAPTCHAs length <= 10 it should be possible to solve all of them [since they are unsalted] in about a week with a GPU)

Proposed solutions

To 1: we should have CAPTCHA providers that generate various sets of CAPTCHAs
To 2: Can't be really mitigated with simple techniques. A viable (but possibly costly to implement) alternative could be using a ZkSNARK to prove to others that you have a word whose hash is the desired one, without revealing such word.
To 3: This seems to me the fastest attack since it lets you solve all the CAPTCHAs in little time. We must check how much good are OCR now. A possible solution will be to find some other task with a definite answer (like: you can check if the user answered correctly automatically, but a machine should strive to get it done) that is easy for all humans and hard for machines.
For imachug "Birthday paradox attack": we should get a PoW that is more equal to calculate among device, using memory-bounded hashing function, for example CryptoNight or similar ones underlying some famous cryptocurrency just to be sure that no ASICs are possible with current techniques (since otherwise they would have done it into the market), noting that regular users still have to calculate many of them to match the last five digits of the message hash.

purplesyringa commented 5 years ago

Actually... I just found a bug in my calculations. It becomes 30 seconds to post with captcha on my computer (and it's rather fast). So maybe it would be a better solution to run 10 000 000 iterations of SHA-512. So this makes it 3 seconds to post on my machine, and about 2 days for attacker. Still good, though.

Thunder33345 commented 5 years ago

to 1 seems not ideal: as unless we have some sort of centralization that force users to post in certain difficulty that wont do it

what if the solution is hash(solution + publicly calculated value) but then to verify it we would have a problem as we need human to manually enter the solution assuming some 1 way thing that can proof it's correct without giving the solution away is a good way

purplesyringa commented 5 years ago

Here's zkSNARKs implementation for JavaScript: https://github.com/iden3/snarkjs.

trenta3 commented 5 years ago

@imachug I didn't know there were zksnarks for Javascript already available! In this way one could have trusted parties (external sites) specialized in captchas and with some fixed amount of them (like some millions).

HelloZeroNet / ZeroNet

POW content signing #1815

Why bother

POW each publish

POW for file size

Other ways?

Considerations

Proposed solutions