ethereum / EIPs

The Ethereum Improvement Proposal repository
https://eips.ethereum.org/
Creative Commons Zero v1.0 Universal
12.89k stars 5.28k forks source link

Yet another cool checksum address encoding #55

Closed vbuterin closed 7 years ago

vbuterin commented 8 years ago

EDITOR UPDATE (2017-08-24): This EIP is now located at https://eips.ethereum.org/EIPS/eip-55. Please go there for the correct specification. The text below may be incorrect or outdated, and is not maintained.

Code:

def checksum_encode(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr))
    for i, c in enumerate(addr.encode('hex')):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - i))) else c.lower()
    return '0x'+o

In English, convert the address to hex, but if the ith digit is a letter (ie. it's one of abcdef) print it in uppercase if the ith bit of the hash of the address (in binary form) is 1 otherwise print it in lowercase.

Benefits:

UPDATE: I was actually wrong in my math above. I forgot that the check bits are per-hex-character, not per-bit (facepalm). On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

Examples:

simenfd commented 8 years ago

@tauteh1221 This was bound to happen. I am afraid we can se even larger errors in the future between ICAP Direct and ICAP Basic. The two formats differ by a single character, and there is a ~1% chance the checksum will say "go ahead".

taoteh1221 commented 8 years ago

@simenfd Agreed, I even predicted it further up in this thread: https://github.com/ethereum/EIPs/issues/55#issuecomment-186614582

The good news is there are surely non-technical folks using Ethereum now, just have to balance security and features with ease-of-use and failsafes going forward to keep them around.

Arachnid commented 8 years ago

Since this is implemented in several places now, and since the actual implementation doesn't match that described in the initial post, would it be possible to write this up as a proper EIP and submit it, so one doesn't have to read the whole bug thread to determine what's in actual use?

alexvandesande commented 8 years ago

@Arachnid you're right

SinErgy84 commented 8 years ago

Maybe it's a little bit offtopic, but i've tested some SHA3 implementations for php (https://github.com/strawbrary/php-sha3 , https://github.com/0xbb/php-sha3 , https://notabug.org/desktopd/PHP-SHA3-Streamable) and other js-libraries (https://github.com/emn178/js-sha3 , https://github.com/Caligatio/jsSHA/releases/tag/v2.2.0).

Hashing example string: qwerty Hash output variant / length: 224

All of them results in the following hashed value: 13783bdfa4a63b202d9aa1992eccdd68a9fa5e44539273d8c2b797cd Comparing it to the output of the Crypto-JS SHA3 implementation the hashed value completely differs: d7a12ecec4442f1b31eea5f7d5470f0ca6169463e09d91a147c3b8e8

Someone mentioned this issue already at stackoverflow: http://stackoverflow.com/questions/36657354/cryptojs-sha3-and-php-sha3

So checksumAddresses only works "correctly" with Crypto-JS, with other libraries it's failing, because of calculating wrong uppercase and lowercase signs.

simenfd commented 8 years ago

What you are seeing is SHA3-224 vs Keccak-224. Check for yourself at: https://emn178.github.io/online-tools/keccak_224.html

What you want is SHA-3, that is the "standard", and most compatible with other libraries.

chfast commented 8 years ago

Remember also that Ethereum is using Keccak, not SHA3.

chevdor commented 8 years ago

Short summary because it seems that implementations have evolved and I chased the correct implementation.

janx commented 8 years ago

Python's implementation:

axic commented 8 years ago

@chevdor the main tree is at https://github.com/ethereumjs/ethereumjs-util/ and passes the tests listed in https://github.com/ethereum/EIPs/issues/55#issuecomment-187765837

pipermerriam commented 8 years ago

8-f is 50/50.

On Thu, Oct 20, 2016, 2:24 PM Chevdor notifications@github.com wrote:

Short summary because it seems that implementations have evolved and I chased the correct implementation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ethereum/EIPs/issues/55#issuecomment-255217792, or mute the thread https://github.com/notifications/unsubscribe-auth/AAyTgkbzOVgKgo023hX_YeKWcyp_gR3Aks5q184MgaJpZM4HEtnF .

pipermerriam commented 8 years ago

Capitalising for >= 8 or >= a should be identical as 8 and 9 cannot be capitalised anyway.

This isn't correct. You capitalize based on the digit in the sha3 of the lowcased 40 character (20 byte) hexidecimal representation of the address. The capitalization is done to the actual characters of the address itself so there is a difference between >=8 and >=9. >=8 is the correct implementation.

Another python implementation here: https://github.com/pipermerriam/web3.py/blob/master/web3/utils/address.py#L45

On Fri, Oct 21, 2016 at 3:37 AM Alex Beregszaszi notifications@github.com wrote:

@chevdor https://github.com/chevdor the main tree is at https://github.com/ethereumjs/ethereumjs-util/ and passes the tests listed in this EIP.

Capitalising for >= 8 or >= a should be identical as 8 and 9 cannot be capitalised anyway.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ethereum/EIPs/issues/55#issuecomment-255338050, or mute the thread https://github.com/notifications/unsubscribe-auth/AAyTgk486tOB8GMnuXw2RKBFQYpq30Alks5q2IftgaJpZM4HEtnF .

chevdor commented 8 years ago

@pipermerriam I think you are commenting old comments. I discussed with @axic and the topic is clear. I do agree with your comment about >=8 not being the same than >=9 since it is based on the hash.

pipermerriam commented 8 years ago

@chevdor not sure what happened there. I must have been looking at really old email notifications or something. 😄 carry on.. nothing to see here...

axic commented 8 years ago

@pipermerriam I've commented that without reading the implementation from months ago :smiley:

recmo commented 7 years ago

Initially, @vbuterin suggested to capitalise whenever the hash character is a..f

No. The original proposal capitalizes the n-th hex-digit whenever the n-th bit in the hash of the address is set. So the first 40 bits of the 224 bit hash are used.

The current implementation modifies this by taking the hash of the lowercase hexadecimal encoding of the address and then it uses every fourth bit for capitalization (so 1st bit, 5th bit, etc.). The main reason for this extra complexity is that Javascript or it's libraries are bad at handling binary data, and this is somehow easier.

Here is @vbuterin original implementation updated with these changes. It passes @alexvandesande's test vectors:

from ethereum import utils

def checksum_encode2(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr.hex()))
    for i, c in enumerate(addr.hex()):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - 4*i))) else c.lower()
    return '0x'+o

def test(addrstr):
    assert(addrstr == checksum_encode2(bytes.fromhex(addrstr[2:])))

test('0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed')
test('0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359')
test('0xdbF03B407c01E7cD3CBea99509d93f8DDDC8C6FB')
test('0xD1220A0cf47c7B9Be7A2E6BA89F429762e7b9aDb')
sathishvj commented 7 years ago

Is there a valid, latest go implementation of this that you could recommend?

almindor commented 7 years ago

Could someone please finally specify the Hash algorithm used to hash the address and get the bits from?

There are at least 3 different hashes mentioned and even used in various imlementations.

My understanding is that the correct hash is supposed to be SHA3-256, but it seems some implementations are using SHA3-224 and others use Keccak-256 and Keccak-224

vaib999 commented 7 years ago

I am curious what java implementation of this is ?

cdetrio commented 7 years ago

@almindor

You'll find the correct specification and example implementations at the file here: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md. The file also includes an adoption table to help track the adoption of EIP-55 checksums in the ecosystem.

We're going to close this issue now. If any corrections need to be made (or to update the adoption table), please open a PR on the file.

prusnak commented 7 years ago

You should edit the example code and test vectors in the first post. It is wrong and someone who does not read the whole conversation will use the incorrect implementation.

cdetrio commented 7 years ago

This EIP is now located at https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md. Please go there for the correct specification. The text in this issue may be incorrect or outdated, and is not maintained.

axic commented 6 years ago

@cdetrio can you push the "official test suite" into the EIP?

I believe it is this one: https://github.com/ethereum/eips/issues/55#issuecomment-187765837

adyliu commented 6 years ago

Java checker of ethereum address https://gist.github.com/adyliu/6c5ff4d41aa0177da55f4b8b1703f54a

voron commented 6 years ago

Current python3 eth-utils implementation

python3 -c "from eth_utils import address; import sys; print(address.to_checksum_address(sys.argv[1]));" 0x5aaeb6053f3e94c9b9a09f33669435e7ef1beaed

Output is

0x5aAeb6053F3E94C9b9A09f33669435E7Ef1BeAed
Th1983 commented 2 years ago

Thanks

Mortiemi commented 1 month ago

EDITOR UPDATE (2017-08-24): This EIP is now located at https://eips.ethereum.org/EIPS/eip-55. Please go there for the correct specification. The text below may be incorrect or outdated, and is not maintained.

Code:

def checksum_encode(addr): # Takes a 20-byte binary address as input
    o = ''
    v = utils.big_endian_to_int(utils.sha3(addr))
    for i, c in enumerate(addr.encode('hex')):
        if c in '0123456789':
            o += c
        else:
            o += c.upper() if (v & (2**(255 - i))) else c.lower()
    return '0x'+o

In English, convert the address to hex, but if the ith digit is a letter (ie. it's one of abcdef) print it in uppercase if the ith bit of the hash of the address (in binary form) is 1 otherwise print it in lowercase.

Benefits:

  • Backwards compatible with many hex parsers that accept mixed case, allowing it to be easily introduced over time
  • Keeps the length at 40 characters
  • ~The average address will have 60 check bits, and less than 1 in 1 million addresses will have less than 32 check bits; this is stronger performance than nearly all other check schemes. Note that the very tiny chance that a given address will have very few check bits is dwarfed by the chance in any scheme that a bad address will randomly pass a check~

UPDATE: I was actually wrong in my math above. I forgot that the check bits are per-hex-character, not per-bit (facepalm). On average there will be 15 check bits per address, and the net probability that a randomly generated address if mistyped will accidentally pass a check is 0.0247%. This is a ~50x improvement over ICAP, but not as good as a 4-byte check code.

Examples:

  • 0xCd2a3d9f938e13Cd947eC05ABC7fe734df8DD826 (the "cow" address)
  • 0x9Ca0e998dF92c5351cEcbBb6Dba82Ac2266f7e0C
  • 0xcB16D0E54450Cdd2368476E762B09D147972b637