Closed sergeevabc closed 10 years ago
There are some efforts underway; mainly by @cuardin (https://github.com/cuardin/MasterPasswordJS) and @tmthrgd (https://github.com/tmthrgd/mpw-js). I hope to at some point derive an official port from these. If you're interested in being part of the effort, you're welcome at #masterpassword on irc.freenode.net
Thanks for invitation, Maarten, alas #masterpassword is silent when I come in.
Looking at the algorithm, I'm curious why do you go with
result = map(hmac-sha256(scrypt(masterpassword, username), sitename))
instead of
result = map(scrypt(masterpassword, username+sitename))
?
sitename
(or better say resource) is smth like herrsmith@openmailbox.org:ebay
or dad:homewifi
After all Scrypt already uses HMAC-SHA256 under the hood (theory, implementation).
To weigh in on the choice, not being privy to the real reason, I can say that scrypt by design is a slow and memory hungry implementation. Having written the latter referenced js version, I can say unequivocally it is slow. On my system the Web Crypto HMAC takes about 3ms while scrypt takes almost 1.5s, orders of magnitude slower.
The actual algorithm's method of using hmac-sha256 allows many different passwords to be calculated quickly by the application while in no way reducing the security. In fact I would wager it improves the security by allowing the masterpassword to be cleared from memory much sooner.
They're reasons I can see at least.
@tmthrgd is correct; there are two distinct phases to the algorithm because they have a distinct purpose. The first is to defend against offline attacks (brute-force, dictionary, rainbow table, ...) against your master password, the second is to generate a site's password. If the two phases were merged, we'd have to perform the scrypt phase for every site password you want to generate, where as now you have the freedom to temporarily cache the master key, if you so desire, and re-use it for generating multiple site passwords.
@tmthrgd @lhunath Ingenious! Thanks for explanation.
I don't know what the original thinking was, but I for one like the current implementation. Scrypt is a intentionally slow algorithm. As such, it is better that an app only need to run it once and then passwords for all sites can be generated instantly. Especially in the mobile world and JS where I work, the time to run all of script is a huge deal.
Sent from my iPhone
On 16 Oct 2014, at 12:00, Alexander Sergéev notifications@github.com wrote:
Thanks for invitation, Maarten, alas #masterpassword keeps silence when I come.
Looking at Mastepass algorithm, I'm curious why do you go with result = map(hmac-sha256(scrypt(username, masterpassword), sitename) instead of result = map(scrypt(masterpassword, username+sitename)), After all Scrypt already uses HMAC-SHA256 within.
— Reply to this email directly or view it on GitHub.
Reading about algorithm again and again:
com.lyndir
into more common name with trivial space as delimiter like John Smith
or Sherlock Holmes
? The latter is clearer from cross-cultural prospective, recognizable as name at once.masterpassword
here "com.lyndir.masterpassword" . name length . name
and here key, "com.lyndir.masterpassword" . site name length . site name . counter
. Does it mean you join masterpassword in plain form to full name on both stages?@sergeevabc
You're confusing the purpose of the com.lyndir.masterpassword
string. It's written in reverse domain name notation. The purpose of a namespace is to provide a unique identifier for each piece of software (or any thing really for that matter) and while namespaces can be any domain name, standard practice is to chose a subdomain you control.
So com.lyndir.masterpassword
is mapping to the DNS name masterpassword.lyndir.com
of which lyndir.com
is owned by the developer. Of course this is just arbitrary and is not meant to actually represent a resource, just be a unique identifier that will not likely collide.
"com.lyndir.masterpassword" . name length . name
might better be written as "com.lyndir.masterpassword" ∥ INT_32_BE(name length) ∥ name
. That is the string "com.lyndir.masterpassword"
concatenated with the integer length of name and then name.
This is done at a binary level, so "com.lyndir.masterpassword" becomes the byte array [ 99, 111, 109, 46, 108, 121, 110, 100, 105, 114, 46, 109, 97, 115, 116, 101, 114, 112, 97, 115, 115, 119, 111, 114, 100 ]
and if name was 4 characters long INT_32_BE(name length)
would become [ 0, 0, 0, 4 ]
, and if name was 'name' it would become [ 110, 97, 109, 101 ]
.
Then each of these is concatenated together to become:
[ 99, 111, 109, 46, 108, 121, 110, 100, 105, 114, 46, 109, 97, 115, 116, 101, 114, 112, 97, 115, 115, 119, 111, 114, 100, 0, 0, 0, 4, 110, 97, 109, 101 ]
.
Correct. com.lyndir.masterpassword
is a static string that doesn't change. It's a namespace to avoid the possibility of Master Password's output to ever collide with the output of another program that does things the same way (however unlikely that may be). Additionally, recently the namespace has been used to provide an extra feature: defining what kind of output to generate. There are currently 3 kinds, each with a unique namespace identifier:
com.lyndir.masterpassword
com.lyndir.masterpassword.login
com.lyndir.masterpassword.answer
The reason we're using distinct namespaces for these three different types of output is to ensure that when the site name and the master key is the same, knowing one of these gives a third party no hint about any of the others (eg. if someone sees your username, they can't use that to speed up finding your password).
The login
and answer
scopes are currently in beta for the iOS app. The C app has initial support (currently just the login
scope); which reminds me now that I should expand that.
The idea behind introducing a login
scope is to make your username stateless as well (so you can't forget it), in addition to being arbitrary and site-specific (which helps anonymity).
The idea behind introducing an answer
scope is to defeat the ridiculous practice of sites enquiring about your utmost personal life experiences under the veil of providing additional "security". "Tell me your mother's maiden name, and then if I ask you again later, I'll be certain you can only really be yourself!". "How about the full name of the person you're currently cheating on your spouse with?"
@lhunath That's got to be the greatest security question ever. I'll be sure to use that one day. To be fair most people probably guard their lovers identifies more than the passwords they use.
If that three namespace approach is likely to be standardised I'll look at incorporating it into mpw-js.
I would have also thought that simply changing the data signed in calculateSeed would provide the same scope and would still reveal nothing about the masterpassword without requiring scrypt to be re-run. Is there an Issue discussing that choice?
The scope for generating the master key is currently always com.lyndir.masterpassword
. The scope for generating the site identifier is the one that changes.
I've added support for usernames and answers in mpw-js, see https://github.com/tmthrgd/mpw-js/commit/a4e6171fa0a93c89bcfbb0d37705bf5c444a814c.
Are there any test vectors for master password?
No, but there really should be; it's on my mental todo list, somewhere.
Note that answers have an optional context argument which is necessary for if a site needs eg. 3 answers and requires them all to be distinct. In that case, the context is the question (I propose using the most significant word(s) in the question). The context is implemented as appending a name + context string to the end of the site seed calculation in the same way that the site name is appended to it.
Also, there are two new templates to match these; usernames default to the "Name" type and answers default to the "Phrase" type, see https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/Resources/Data/ciphers.plist and https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/ObjC/MPAlgorithmV2.m#L68
I am adding this to the C code now.
mpw.c now supports logins and security answers.
-v variant The kind of content to generate.
Defaults to 'password'.
p, password | The password to log in with.
l, login | The username to log in as.
a, answer | The answer to a security question.
-C context A variant-specific context.
Defaults to empty.
-v p, password | Doesn't currently use a context.
-v l, login | Doesn't currently use a context.
-v a, answer | Empty for a universal site answer or
| the most significant word(s) of the question.
https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/C/mpw.c https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/C/types.c
I've added support for context and the other templates in mpw-js, see https://github.com/tmthrgd/mpw-js/commit/bb04a74d296e2d42fe1d51a620ba8f26d679ed7b.
@lhunath How should empty context be handled? Is my implementation correct in excluding both the length and the context when it's empty, as in tmthrgd/mpw-js/blob/master/mpw.js#L111, or should it add the zero length [ 0, 0, 0, 0 ]
?
@lhunath Also what is the correct way to handle spaces in the template? Should they consume a byte of seed or should they be skipped over? That is to say should "x x"
result in chars[seed[1]] + " " + chars[seed[3]]
or chars[seed[1]] + " " + chars[seed[2]]
?
Oh, guys, for the past hours you have made a long way off the initial question, however there are no changes in documentation hence confusion still remains.
You introduced a concept of namespaces above, but it's not mentioned with a single word in The Master Key
paragraph or later, yet meaning of other values is well-explained. I believe it should be clarified right there to follow the original statement “if you don't know how it works, you cannot assume it is secure”.
So far user learns that some com.lyndir
pops up all of a sudden and then it's silently transformed into com.lyndir.masterpassword
with word masterpassword
in it, which gives rise to an idea that user's masterpassword is joined to com.lyndir
. Hope now you see what causes difficulties.
Once again about schemes as I still miss something.
Known solutions use 1-stage generation (e.g. PBKDF2-HMAC-SHA256(masterpassword, resource)
). Current solution uses 2-stages to deal with speed better and, I quote, to deter any attempts at brute-forcing master password from a known site password. Let's forget about speed for a while and focus on the last part: can intruder get key
without master password from site password?
Question is inspired by the basic comparison:
hmac-sha256(masterpassword, resource)
iterated N timeshmac-sha256(key_frommasterpassword, resource)
iterated 1 timeOf course, key_frommasterpassword
looks different than masterpassword
, but in the end, whatever the value, it seems the only thing intruder needs to generate passwords for other resources, and because key_frommasterpassword
is hidden just by 1 iteration there's a feeling it's easier to get.
@sergeevabc I'll see if I can help you there when I have the time, but you might be better discussing that in a separate issue as it doesn't exactly fit this one.
Yes, you can try to brute-force the master key instead. Seeing as it's 64-byte, you'll need 864 attempts.
At a rate of 12328M / s, that'll still take a mind-boggling amount of times the age of the universe. ( `(864 / 12328000000)/(3600_24_356) = 16553994299963990052001805305020348873001` years )
As for the documentation not being in-sync with the scopes I just mentioned, that's because they're currently still considered beta. The documentation will be updated as soon as the official iOS app is released.
I don't really understand your confusion with respect to the com.lyndir.masterpassword
identifier. At no point is a com.lyndir
ever transformed into com.lyndir.masterpassword
. The only reference I see to "com.lyndir
" by itself is the example telling you what the data types are, where the string is just an example string to show how strings are to be encoded.
@lhunath
On documentation: com.lyndir.masterpassword
is already mentioned within The Master Key
and The Template Seed
paragraphs, however its purpose is omitted.(for example, I could click on Scrypt link and learn about N
, r
, p
, dkLen
, but I was able to grasp meaning of com.lyndir.masterpassword
just after @tmthrgd informed of reverse domain notation practice to present unique identifier).
On algorithm: assuming it takes so long to brute-force just 1 quick iteration of hmac-sha256(64bytekey, sitename)
then why bother with expensive Scrypt in the first place if we could get key easier by doing var 64bytekey = pbkdf2.hmac.sha256(masterpassword, username, 1, 64)
?
The purpose of scrypt is to derive a long (64-byte) key from a tiny memorable password. While brute-forcing a 64-byte key takes ages, brute-forcing a memorable password is trivial, especially at 12B/s. We use scrypt over pbkdf2 because it is memory-hard. The fact that scrypt is expensive is exactly the reason why I choose it: We need a 64-byte key derivation algorithm that takes a long time, but more importantly, one that takes JUST AS LONG for an attacker as it takes for a simple user. This last part is what scrypt is really good at; an attacker can't buy a cheap ASIC system with a mass of cheap GPU power and get a better key derivation performance. This property is what makes attempting to brute-force your master password so expensive (dollar-wise), because you need to buy a lot of RAM and CPU to be able to parallelize scrypt brute-forcing sufficiently. The scrypt paper illustrates this much better than I can in this comment.
@sergeevabc It's all got to do with the ability to reverse the function. As HMAC, SHA-256 and to a lesser extent scrypt are all well, professionally designed one-way functions. As a consequence of their cryptographic properties the only* way to reliably find the input or key is to use a brute-force attack; trying one after another.
Statistically it is important less when p = 1 = 100% (certain) but more when p = 0.5 = 50% (fifty-fifty). In an appropriately designed cryptographic algorithm like these (non-asymmetric) this occurs when you've covered exactly half the key space.
The key derived from the master key is 64 bytes long, which is 64 × 8 = 512 bits. That gives a total of 2512 ≈ 1.3408×10154 possible keys (that is approximately 1 followed by 154 zeros). Considering when p = 0.5, we need to cover 2512 ÷ 2 = 2511 ≈ 6.7039×10153 keys (approximately 7 followed by 153 zeros).
The most important thing to note in brute-forcing large key spaces is not the time needed but the energy needed. The absolute minimum energy needed to do nothing more than flip a bit (to traverse the key space, completely neglecting the checking) is given by the Landauer limit implied by the laws of physics. The Landauer limit is kT × ln 2, "where k is the Boltzmann constant (approximately 1.38×10-23 J/K), T is the temperature of the circuit in kelvins, and ln 2 is the natural logarithm of 2 (approximately 0.69315)." T is considered to be room temperature of approximately 300 Kelvins.
For a 512-bit key we need (2511 - 1) × 1.38×10-23 × 300 × ln 2 ≈ 1.9238×10133 J (Joules) ≈ 1.9238×10133 × 2.78×10-7 kW⋅h ≈ 5.3481×10126 kW⋅h (Kilowatt-hours) of energy. Even in the largest SI/metric prefix (yotta, 1029) it is 5.3481×10105 YW⋅h.
To put that in perspective the global electricity production is 23,127,000 GW⋅h (gigawatt-hours). If you consumed all the worlds electricity, it would take 5.3481×10120 ÷ 23127000 ≈ 2.3125×10113 years worth of electricity. For perspective the age of the universe is (13.798 ± 0.037)×109 years. That's 2.3125×10113 ÷ (13.798×109) ≈ 1.6760×10103 times the age of the universe.
To put it in perspective another way, the sun outputs 384.6 yotta watts (3.846×1026 W) of energy per second. If you could capture every watt of energy the sun produced it would take 5.3481×10129 ÷ (3.846×1026) ≈ 1.3906×10103 seconds ≈ 1.3906×10103 ÷ 3600 ÷ 24 ÷ 365 years ≈ 4.4094×1095 years worth of solar energy. Which is 4.4094×1095 ÷ (13.798×109) ≈ 3.1957×1085 times the age of the universe.
That's only to reach a probability of 0.5, to be certain you'd need twice those figures.
The point here is that it's totally infeasible verging on physically impossible to brute-force a 512 bit pseudo-random key from the energy requirement alone†.
Passwords are a different matter entirely, if we assume people constructed there password from just the Latin alphabet, then we have a possible 26 × 2 = 52 characters. For 6 characters there are 526 ≈ 1.9771×1010 possible passwords. But this isn't even strictly relevant, people do not pick passwords well or with random distribution.
If you brute-force a standard password you'll reach p = 0.5 with only a fraction of the number of required passwords. Totally feasible and within the range of modern computing. It is also possible to enumerate a list of common passwords, mixed with stolen password databases to cover a very large number of passwords. Because of the construction of Master Password it would be impractical to construct a dictionary to mount a dictionary attack.
The problem with passwords is that you can efficiently brute-force them very easily and with little memory. That's where scrypt comes in. Like all key derivation functions it is designed to take time, it is not designed to be fast like a standard hash algorithm. Slow algorithms on there own are sufficient to defeat a CPU brute-force but they can easily be beaten by a GPU, FPGA or ASIC brute-force. Now scrypt by design requires a large amount of memory to be efficient, this completely defeats the GPU, FPGA and ASIC brute-forces because the cost of memory is so prohibitive. Scrypt gives a time-memory trade off, you either commit to the enormous cost of fast memory or accept an extremely slow attack.
I'd like to qualify the statement: Scrypt helps to produce 64 byte key from master password, then key is used with HMAC-SHA256, which reduces the output to 32 byte according to RFC4868, is that true?
@sergeevabc loosely yes. It's something like this password -> scrypt -> HMAC-SHA256. The key derived using scrypt is used as the key for HMAC. The HMAC signs the data "com.lyndir.masterpassword" ∥ INT_32_BE(name length) ∥ name
with the scrypt key.
I've removed the requirement for the Web Crypto API, the Encoding API and Typed Arrays, see https://github.com/tmthrgd/mpw-js/commit/bb9bd41c6058ab612e113eea7e18db1ac729f150.
mpw-js should now work on all browsers. I'm not aware of anything preventing it working as far back as IE6.
@lhunath If you could answer https://github.com/Lyndir/MasterPassword/issues/89#issuecomment-59642261 and https://github.com/Lyndir/MasterPassword/issues/89#issuecomment-59642737 for me that would be very much appreciated.
@tmthrgd when the context is empty we exclude it in the interest of backwards compatibility.
And spaces in the template render as spaces in the output. I've implemented this not as a hardcoded exception but as a space character class which is just one character long.
Also: good news on mpw-js. Did this involve any sacrifices?
@lhunath None whatsoever (except my aversion to including cryptographic algorithms written in javascript). You can access it at https://tmthrgd.github.io/mpw-js/.
@tmthrgd Consider adding disposable email address
type to get unique prefix for harakirimail.com, spamfinity.com. one-time.email (to name a few “disposable inbox” providers available via HTTPS).
E.g. to get f89289758a7c448345c87f7e5a795318d6e07ac5@spamfinity.com or even something shorter.
I don't know how exactly these services work, but you could just use the generated username for your site as a prefix.
@lhunath Disposable email services work almost the same: when you register on untrusted site, you're required to input your e-mail address to receive a verification letter and some marketing spam later on, but instead of disclosing true address it might be wiser to input imaginary one. Then you visit disposable service site to access the inbox. Incoming letters are removed within short period of time (from 10 minutes to 3 days usually), however the inbox is not password-protected, so anyone could access it, hence it's better to make prefix unique. To serve the purpose I use HMAC-RIPEMD160 now.
spamfinity.com does support password protected inboxes; so use [site username].[site name]+[spamfinity password]@spamfinity.com
eg.
$ mpw -u 'Robert Lee Mitchell' -v l amazon.com
Your master password:
zuyhupovi
$ mpw -u 'Robert Lee Mitchell' spamfinity.com
Your master password:
Taxm0+TibbNila
Now use zuyhupovi.amazon.com+Taxm0+TibbNila@spamfinity.com
@tmthrgd @lhunath Consider adding donate links below your in-depth replies, one could learn a lot here. :) That revelation about energy needed to brute-force is a knock-out, sci-fi megastructures loom…
What's the point of separate conversion namelength
and sitenamelength
to 32-bit unsigned integer instead of salt=stringToUTF8Array("com.lyndir.masterpassword"+name.length+name)
?
function stringToUTF8Array(s) {
var i,
d = unescape(encodeURIComponent(s)),
b = new Uint8Array(d.length);
for (i = 0; i < d.length; i++) b[i] = d.charCodeAt(i);
return b;
}
@sergeevabc well first and for most they do very distinct things. (In JS:) "com.lyndir.masterpassword" + name.length + name
starts with the string "com.lyndir.masterpassword"
, it then converts name.length
from an integer into a string (so 4
-> "4"
) and joins it, it then adds the string name
. Your stringToUTF8Array
then converts this into a byte array with the possibility of many errors and instead the Encoding API should be used. In mpw.js I convert the string "com.lyndir.masterpassword"
into a byte array before concatenating it with the binary integer name.length, and the name converted into a byte array.
Aside from being more 'proper' this has a real advantage. A 32-bit unsigned integer can hold 232 = 4294967296 distinct values, from 0 to 232 - 1 = 4294967295. Being 32-bits it takes up exactly 4 bytes, always. A string is really a variable length character array (good ol' c). So to store the random† integer 3205018717 as a 32-bit unsigned integer in big-endian form gives the byte sequence [ 191, 8, 180, 93 ]
.
To store the same random integer as a string is very different. First the integer is converted into a string so 3205018717
becomes "3205018717"
, which is a collection of characters. Then the string is converted‡ into a very different byte array: [ 51, 50, 48, 53, 48, 49, 56, 55, 49, 55 ]
. This byte array does not represent the integer but the characters that the individual digits map to. So it maps the digits in the number into UTF-8 characters between 0 (48) and 9 (57). One of the many problems in using this method is that the result varies in length from a single byte for 0 ≤ x < 10 to ten bytes for 4294967295. Another issue is that it does not enforce or respect the type. If length was to be set to 78.475, a floating point decimal, it would become "78.475"
and then [ 55, 56, 46, 52, 55, 53 ]
(note 46 is the period .
).
@sergeevabc thanks, I have to say the energy aspect is my favourite way to look at brute-forcing. It's one of the hardest issues to overcome and totally left-field for most.
@sergeevabc would adding an option for custom template strings be acceptable? (@lhunath)
I considered adding this ability to mpw.js but decided against it to keep as close to the spec as possible but would certainly consider adding it, if there is a sufficient use case.
Custom templates is one of those things asked for once in a while, and each time I've rejected the notion. One of the primary benefits of being a stateless password solution is the fact that you can get away with remembering nothing but your master password and all else can disappear; from that one thing, you can always recover your passwords later.
If we're going to start introducing things like custom templates, people are going to ignorantly start using them. That'll lead to situations where everyone is happy until they somehow lose state or forget their custom template for site X.
There really isn't a true need for such a thing either; and I'd prefer to keep MP simple, and keep people from doing stupid things they'll regret later. Master Password is a stateless solution, let's keep it that way.
Note that I'm not averted to introducing new standard templates, so long as we can support their general usefulness. In fact, I'm considering ways to provide a better way to gradually increase template complexity in a stateless-friendly way. This should probably be a separate issue.
WRT. encoding numbers as UTF-8 characters of their decimal representation and then back into bytes; that sounds pretty silly indeed. Never mind that fixed-field size values effectively act as a delimiter in the final byte stream, which can be effective at defeating injection and length extension attacks.
It's not exactly a single-file implementation, but here's a start:
http://masterpasswordapp.com/mpw-js/
It doesn't appear to work yet in IE and Chrome wants it to be on HTTPS, which is going to be a bit of a challenge by itself. But it's a start.
Weeks of searching for modern deterministic password generator, so many ambitious projects exist halfdone, outdated or abandoned (Vault, Oneshallpass.com, Saltthepass.com, PwdHash, PassHash, Vpass, PasswordGen, ShaPasswords, PasswordGenerator2, Hash0, Plevyak's DPG, BPasswd2, SecurePasswords, Passwordly + every related extension from Chrome Webstore, Mozilla addons and Github repositories), and here comes promising MasterPassword with someone passionate about security behind. At last!
Maarten, the one and only question is when MasterPassword will be implemented as single HTML+JS like solutions mentioned above, even cut to a known degree? You know, to upload it here and there and use quickly in untrusted environments like third-party machines of e-cafes, passenger terminals, dangerous places abroad (journo trips to warzones as example), i.e. when Keepass autofill is not available, device is stolen or discharged. Oh, it would load off one's mind beyond belief.
If you're looking for Scrypt on JS, there's async library, already integrated in Minilock. As for SHA256 and HMAC, there is tiny and fast library made by the same developer, let alone Asmcrypto and SJCL.