EOSIO / eos

An open source smart contract platform
https://developers.eos.io/manuals/eos
MIT License
11.27k stars 3.6k forks source link

Change Account Name & Message Types #56

Closed bytemaster closed 7 years ago

bytemaster commented 7 years ago

Account Name Design

All accounts require a globally unique identifier, for ease of development and use it is best if these account names are human readable. For performance it is best if these account names are 64 bit integers.

By using base32 encoding we can support account names up to 12 characters long consisting of the characters: [a-z12345.]

We will consider the . to be a namespace separator and all trailing (unused) characters shall also be '.'.

Converting Strings

An account name can be converted to or from strings as follows:

"dan"          <=> "dan........."
"dan.larimer"  <=> "dan.larimer."

Is 12 characters enough?

The average username length for twitter accounts created in 2012 is 11 characters. The most popular length for domain names traded on the market place was 8 characters.

UI Benefits of Short Names

User interfaces must be designed to handle the full range of name lengths, if they can assume that a name will be at most 12 characters long it will enable more interface uses.

Longer Names

If users would like to register a longer name, potentially using UTF8 characters, then a separate naming contract can be used and user interfaces can opt to lookup / display the longer name.

Rationalle

A simple currency transfer message consists of 5 account names: sender, receiver, notifier, to, and from. If serialized as 32 byte integers this would require 160 bytes, if serialized as length-encoded strings this would take 5*(1+average account length) bytes, if we assume "average account length" is greater than 10 characters then this will take over 55 bytes.

The current design encodes account names as length-encoded strings which means that smart contracts need to parse these strings (often resulting in copying to properly padded memory) and that the database indices need to maintain 32 bytes (fixed length). This results in both CPU and MEMORY being waisted packing and unpacking types while complicating the code in order to provide the benefit of 32 character human readable account names.

The compromise approach retains human readable names for the underlying identifier while allowing users to map to unique long-form names. It also allows account names long enough to support the average twitter username.

Message Types

We can use the same rationale for message types, namely that for performance reasons a 64 bit integer is ideal but for developer purposes a human readable string is preferred. This will allow developers to assign message types with names up to 12 characters long.

nathanielhourt commented 7 years ago

I would stress that, in my view, these names should be for developers only. We simply cannot expose to end users a naming system that allows them to use lower-case letters, positive single digit integers less than 6, and periods (but with special restrictions on the use of periods). We could let them pick their own arbitrary, unrestricted (non-unique) name, and mangle it down to one of these as their "account ID," but that's a wallet/UI question.

Developers, otoh, can work with these names just fine, and with good tooling, they'll probably make a lot of dev/debugging work a lot easier.

Personally, I vote we ditch the period namespace thing, too. I'm not convinced that's a good idea even with long, full alphanumeric names. I don't think it makes sense with these names.

bytemaster commented 7 years ago

The namespace gives people a way to build / apply trust to domain extensions and/or organizations, but I agree by the time you use a few characters for domain the user name is very short.

nathanielhourt commented 7 years ago

It also means people must think of these names as strings, and not integers. If someone chooses one as an integer, and the corresponding string happens to have a dot in it, it will fail.

bytemaster commented 7 years ago

I agree, I use '.' to represent the null bits and then trim it. The only requirement on the integer is that it must be 60 bits and the high bits must be 0. Otherwise there would be a reduced range on the final string character and we would have to special case. I think reserving a few bits may also give us potential to expand in the future.

arhag commented 7 years ago

I'm sure you know my opinion already, since I have stated it with the Steem design as well, but I will state it here anyway. I think developer convenience should just be provided with the appropriate helpful libraries and tooling, and I think that 64-bit IDs (incrementing, not arbitrarily chosen by users/clients) should be used by the blockchain to uniquely identify accounts and message types. Whether it makes sense performance-wise to use variable-length integer encodings, to save bandwidth and blockchain space, or fixed 8-byte integers, to save parsing computation time (only in the very parallel stage 1 mind you), within the actual signed transactions is another matter that is less important to me.

The EOS blockchain could launch with a contract (native or not) that provides a very simple naming system that allows existing accounts to throw away their existing name if they have one and adopt a new one (perhaps with a tiny EOS fee that is burned?) not currently used by anyone else if they do not already have one. These names can have similar character restrictions as Steem but have a max length of say 32 characters. This would be the initial default naming system for the community to adopt, then later when something better and more sophisticated is developed it can supplement this existing namespace (or maybe even replace it if the new one wins in the free market due to say better economic and/or governance models).

Some time later after launch, other smart contract(s) could be launched with a more sophisticated naming system (Maybe a restricted subset of Unicode is allowed? Proper subdomain support with delegation of responsibility?), more sophisticated governance model (Trademark disputes? Adjusting fees?), and more sophisticated economic model (Auctions to own name? Leasing with annual fee based on market valuation, e.g. highest bidder in auction, of the name?).

Either way it would be the clients that tie in these names (in whichever namespace smart contract is used) in a very seamless way to the unique 64-bit account IDs which is what the blockchain really only cares about.

bytemaster commented 7 years ago

@arhag accounts already have an ID that increments in the DB, but this is fraught with issues we experienced with BitShares. Namely, the ID is not known until after it is included in the blockchain. Also debugging BitShares transactions was significantly more difficult because every print statement required access to blockchain state to convert a number into something readable.

The adopted solution allows almost any number to be selected by the user and is effectively the same cost computationally as the incrementing count.

Everything has been implemented and integrated and all unit tests pass.

arhag commented 7 years ago

@bytemaster Well, then I have an alternative suggested tweak to the above proposal.

First, I would accept as a valid name an ASCII string that is exactly 12 characters long and satisfies the regex expression /^([a-z]|[a-z][a-z0-9\-]*[a-z0-9])(\.([a-z0-9]|[a-z0-9][a-z0-9\-]*[a-z0-9]))*(\.)*$/ (this is partially modeled after hostname restrictions, but of course much stricter on length requirements and also stricter on some other smaller things). UIs would of course trim/fill the trailing dots in the name when displaying/requesting the human-friendly account name from users.

As can be seen from the regex expression above, there would be 38 allowed characters in the name (could actually be 40 if we wanted). A number between 0 and 37, inclusive, would represent each character. The mapping between each character and its associated number is as follows:

'.' = 0
'-' = 1
'0' = 2
'1' = 3
...
'9' = 11
'a' = 12
'b' = 13
...
'z' = 37

The 64-bit number would be segmented into a sequence of 4 16-bit numbers. Each 16-bit number would encode a sub-sequence of 3 of the allowed characters. Thus it would still be possible to represent 12 characters overall in the name. (Actually, it would be possible to get 2 additional characters for free while still making it possible to represent 12 characters in 64 bits).

My suggested encoding of the sub-sequence of 3 allowed characters (c_1, c_2, c_3) within a 16-bit number n is to treat the sequence of characters as a sequence of base38 "digits" representing the number, or n = (c_1)*(38*38) + (c_2)*(38) + c_3 (better yet replace each 38 with 40 and we can be forward compatible to later add two additional allowed characters if we wanted). With this encoding we can guarantee that a 3 character sub-sequence in which the first character of the sub-sequence is not a dot, hyphen or a digit is represented by a 16-bit number that satisfies the property that its two most significant bits cannot both be 0.

With the restrictions on names described in the regex expression above, we can guarantee that the 64-bit number encoding a name according to the scheme above will always be greater than 2^62. This means we can overload the account "name" fields in messages. If the 64-bit number is less than 2^62 it represents the unique incrementing ID number identifying an account in the system (which would now be elevated from an implementation detail to an official part of consensus). Otherwise, it is meant to represent an actual account name and must satisfy the validation rules for account names.

Furthermore, because of the name restrictions we know that a valid account name cannot start with a digit. This means that if a user enters a number as an account name in a UI, the client can unambiguously know that it is referring to the ID number of the account and not its account name (and thus construct the transaction appropriately).

The biggest advantage of this approach over the original proposed design is that it allows all 10 digits to be included in the name rather than just 1 through 5. The biggest disadvantage is that it is slightly more computational intensive to extract the name string from the 64-bit number for the purposes of name validation since 38 (the base used for 16-bit sub-sequences) is not a power of 2 like 32 is (it requires doing two div operations rather than just bit manipulation operations).

coolspeed commented 7 years ago

Why not just use base62 ??

arhag commented 7 years ago

@coolspeed Then you could only handle 10 characters. Also, if you were going to do that you might as well have base 64 since that also gives a 10 character limit but also allows you to have uppercase and lowercase letters, 10 digits, and the dot and hyphen. But there is no need for uppercase letters (and in fact going by the hostname standard, they are are case insensitive).

bytemaster commented 7 years ago

@arhag I appreciate your suggestion, but I think we have already agreed that this string representation is mostly used for developer purposes and that the protocol officially recognizes them as ints.

In addition to account names we are also using this to represent permission levels and message types (actions).

By reserving the upper 4 bits it should be possible to extend the format in the future, but even with your changes 12 characters is still too restrictive for end users.

jcalfee commented 7 years ago

To re-cap, 4 bits is not enough for the character set, 5 bits are needed.

1) If names are up to 12 characters long, then why are the Name to string (and string to Name) functions allowing for a 4 bit 13th character? For now in eosjs, I'm just going to require a max length of 12.

2) Why is this test about? The number suffix is changing and the length is beyond 12 chars.. Looks like it is testing that you can go past the limit and the suffix is ignored. Still seems like an error should be thrown if it is greater than 12 chars.

test_types::string_to_name()
WASM_ASSERT( eos::string_to_name("mlkjihgfedcba55") == N(mlkjihgfedcba14) , "eos::string_to_name(mlkjihgfedcba14)" );
davidfrothin commented 6 years ago

Is account name squatting going to be an issue with people registering 100's of names? Yes, it will still be possible to create unique names but it is going to be hard to have a sensible name.

I also assume people can create infinite number of account names?

godmar commented 4 years ago

I came here trying to understand the behavior of the eosio::name constructor in CDT 1.6.3 that was implemented as a consequence of this issue.

Is it not concerning that the representations dan, dan., dan.. etc. represent the same uint64_t value whereas dan, .dan, ..dan do not?

I'll note that you use a convention that differs from normal number representation in which trailing zeros are significant whereas leading zeros are not - that is, 007 == 7 but 700 != 7. In the current implementation, leading periods are significant whereas trailing periods are not.

On a related note, should the constructor of name reject dan... in the same way that cleos rejects it where it expects a name? cleos errors with Name not properly normalized, but the constructor invoked within a smart contract will happily consider dan...

Is it possible that contracts that construct names from strings could be tricked into treating dan... as dan and potentially send tokens to dan that they meant to send to an account they believed to be different from dan because it had a different eosio::name?