Lyndir / MasterPassword

Project moved to https://gitlab.com/spectre.app
http://masterpassword.app
GNU General Public License v3.0
1.14k stars 237 forks source link

[RESEARCH] HTML+JS single file implementation #89

Closed sergeevabc closed 10 years ago

sergeevabc commented 10 years ago

Weeks of searching for modern deterministic password generator, so many ambitious projects exist halfdone, outdated or abandoned (Vault, Oneshallpass.com, Saltthepass.com, PwdHash, PassHash, Vpass, PasswordGen, ShaPasswords, PasswordGenerator2, Hash0, Plevyak's DPG, BPasswd2, SecurePasswords, Passwordly + every related extension from Chrome Webstore, Mozilla addons and Github repositories), and here comes promising MasterPassword with someone passionate about security behind. At last!

Maarten, the one and only question is when MasterPassword will be implemented as single HTML+JS like solutions mentioned above, even cut to a known degree? You know, to upload it here and there and use quickly in untrusted environments like third-party machines of e-cafes, passenger terminals, dangerous places abroad (journo trips to warzones as example), i.e. when Keepass autofill is not available, device is stolen or discharged. Oh, it would load off one's mind beyond belief.

If you're looking for Scrypt on JS, there's async library, already integrated in Minilock. As for SHA256 and HMAC, there is tiny and fast library made by the same developer, let alone Asmcrypto and SJCL.

lhunath commented 10 years ago

There are some efforts underway; mainly by @cuardin (https://github.com/cuardin/MasterPasswordJS) and @tmthrgd (https://github.com/tmthrgd/mpw-js). I hope to at some point derive an official port from these. If you're interested in being part of the effort, you're welcome at #masterpassword on irc.freenode.net

sergeevabc commented 10 years ago

Thanks for invitation, Maarten, alas #masterpassword is silent when I come in.

Looking at the algorithm, I'm curious why do you go with result = map(hmac-sha256(scrypt(masterpassword, username), sitename)) instead of result = map(scrypt(masterpassword, username+sitename))?

After all Scrypt already uses HMAC-SHA256 under the hood (theory, implementation).

tmthrgd commented 10 years ago

To weigh in on the choice, not being privy to the real reason, I can say that scrypt by design is a slow and memory hungry implementation. Having written the latter referenced js version, I can say unequivocally it is slow. On my system the Web Crypto HMAC takes about 3ms while scrypt takes almost 1.5s, orders of magnitude slower.

The actual algorithm's method of using hmac-sha256 allows many different passwords to be calculated quickly by the application while in no way reducing the security. In fact I would wager it improves the security by allowing the masterpassword to be cleared from memory much sooner.

They're reasons I can see at least.

lhunath commented 10 years ago

@tmthrgd is correct; there are two distinct phases to the algorithm because they have a distinct purpose. The first is to defend against offline attacks (brute-force, dictionary, rainbow table, ...) against your master password, the second is to generate a site's password. If the two phases were merged, we'd have to perform the scrypt phase for every site password you want to generate, where as now you have the freedom to temporarily cache the master key, if you so desire, and re-use it for generating multiple site passwords.

sergeevabc commented 10 years ago

@tmthrgd @lhunath Ingenious! Thanks for explanation.

cuardin commented 10 years ago

I don't know what the original thinking was, but I for one like the current implementation. Scrypt is a intentionally slow algorithm. As such, it is better that an app only need to run it once and then passwords for all sites can be generated instantly. Especially in the mobile world and JS where I work, the time to run all of script is a huge deal.

Sent from my iPhone

On 16 Oct 2014, at 12:00, Alexander Sergéev notifications@github.com wrote:

Thanks for invitation, Maarten, alas #masterpassword keeps silence when I come.

Looking at Mastepass algorithm, I'm curious why do you go with result = map(hmac-sha256(scrypt(username, masterpassword), sitename) instead of result = map(scrypt(masterpassword, username+sitename)), After all Scrypt already uses HMAC-SHA256 within.

— Reply to this email directly or view it on GitHub.

sergeevabc commented 10 years ago

Reading about algorithm again and again:

tmthrgd commented 10 years ago

@sergeevabc

lhunath commented 10 years ago

Correct. com.lyndir.masterpassword is a static string that doesn't change. It's a namespace to avoid the possibility of Master Password's output to ever collide with the output of another program that does things the same way (however unlikely that may be). Additionally, recently the namespace has been used to provide an extra feature: defining what kind of output to generate. There are currently 3 kinds, each with a unique namespace identifier:

  1. Generate a password, namespace com.lyndir.masterpassword
  2. Generate a username, namespace com.lyndir.masterpassword.login
  3. Generate an answer for a security question, namespace com.lyndir.masterpassword.answer

The reason we're using distinct namespaces for these three different types of output is to ensure that when the site name and the master key is the same, knowing one of these gives a third party no hint about any of the others (eg. if someone sees your username, they can't use that to speed up finding your password).

The login and answer scopes are currently in beta for the iOS app. The C app has initial support (currently just the login scope); which reminds me now that I should expand that.

The idea behind introducing a login scope is to make your username stateless as well (so you can't forget it), in addition to being arbitrary and site-specific (which helps anonymity).

The idea behind introducing an answer scope is to defeat the ridiculous practice of sites enquiring about your utmost personal life experiences under the veil of providing additional "security". "Tell me your mother's maiden name, and then if I ask you again later, I'll be certain you can only really be yourself!". "How about the full name of the person you're currently cheating on your spouse with?"

tmthrgd commented 10 years ago

@lhunath That's got to be the greatest security question ever. I'll be sure to use that one day. To be fair most people probably guard their lovers identifies more than the passwords they use.

If that three namespace approach is likely to be standardised I'll look at incorporating it into mpw-js.

tmthrgd commented 10 years ago

I would have also thought that simply changing the data signed in calculateSeed would provide the same scope and would still reveal nothing about the masterpassword without requiring scrypt to be re-run. Is there an Issue discussing that choice?

lhunath commented 10 years ago

The scope for generating the master key is currently always com.lyndir.masterpassword. The scope for generating the site identifier is the one that changes.

tmthrgd commented 10 years ago

I've added support for usernames and answers in mpw-js, see https://github.com/tmthrgd/mpw-js/commit/a4e6171fa0a93c89bcfbb0d37705bf5c444a814c.

Are there any test vectors for master password?

lhunath commented 10 years ago

No, but there really should be; it's on my mental todo list, somewhere.

Note that answers have an optional context argument which is necessary for if a site needs eg. 3 answers and requires them all to be distinct. In that case, the context is the question (I propose using the most significant word(s) in the question). The context is implemented as appending a name + context string to the end of the site seed calculation in the same way that the site name is appended to it.

Also, there are two new templates to match these; usernames default to the "Name" type and answers default to the "Phrase" type, see https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/Resources/Data/ciphers.plist and https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/ObjC/MPAlgorithmV2.m#L68

I am adding this to the C code now.

lhunath commented 10 years ago

mpw.c now supports logins and security answers.

-v variant   The kind of content to generate.
             Defaults to 'password'.
                 p, password | The password to log in with.
                 l, login    | The username to log in as.
                 a, answer   | The answer to a security question.

-C context   A variant-specific context.
             Defaults to empty.
              -v p, password | Doesn't currently use a context.
              -v l, login    | Doesn't currently use a context.
              -v a, answer   | Empty for a universal site answer or
                             | the most significant word(s) of the question.

https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/C/mpw.c https://github.com/Lyndir/MasterPassword/blob/master/MasterPassword/C/types.c

tmthrgd commented 10 years ago

I've added support for context and the other templates in mpw-js, see https://github.com/tmthrgd/mpw-js/commit/bb04a74d296e2d42fe1d51a620ba8f26d679ed7b.

@lhunath How should empty context be handled? Is my implementation correct in excluding both the length and the context when it's empty, as in tmthrgd/mpw-js/blob/master/mpw.js#L111, or should it add the zero length [ 0, 0, 0, 0 ]?

tmthrgd commented 10 years ago

@lhunath Also what is the correct way to handle spaces in the template? Should they consume a byte of seed or should they be skipped over? That is to say should "x x" result in chars[seed[1]] + " " + chars[seed[3]] or chars[seed[1]] + " " + chars[seed[2]]?

sergeevabc commented 10 years ago

Oh, guys, for the past hours you have made a long way off the initial question, however there are no changes in documentation hence confusion still remains.

You introduced a concept of namespaces above, but it's not mentioned with a single word in The Master Key paragraph or later, yet meaning of other values is well-explained. I believe it should be clarified right there to follow the original statement “if you don't know how it works, you cannot assume it is secure”.

So far user learns that some com.lyndir pops up all of a sudden and then it's silently transformed into com.lyndir.masterpassword with word masterpassword in it, which gives rise to an idea that user's masterpassword is joined to com.lyndir. Hope now you see what causes difficulties.

sergeevabc commented 10 years ago

Once again about schemes as I still miss something.

Known solutions use 1-stage generation (e.g. PBKDF2-HMAC-SHA256(masterpassword, resource)). Current solution uses 2-stages to deal with speed better and, I quote, to deter any attempts at brute-forcing master password from a known site password. Let's forget about speed for a while and focus on the last part: can intruder get key without master password from site password?

Question is inspired by the basic comparison:

Of course, key_frommasterpassword looks different than masterpassword, but in the end, whatever the value, it seems the only thing intruder needs to generate passwords for other resources, and because key_frommasterpassword is hidden just by 1 iteration there's a feeling it's easier to get.

tmthrgd commented 10 years ago

@sergeevabc I'll see if I can help you there when I have the time, but you might be better discussing that in a separate issue as it doesn't exactly fit this one.

lhunath commented 10 years ago

Yes, you can try to brute-force the master key instead. Seeing as it's 64-byte, you'll need 864 attempts.
At a rate of 12328M / s, that'll still take a mind-boggling amount of times the age of the universe. ( `(8
64 / 12328000000)/(3600_24_356) = 16553994299963990052001805305020348873001` years )

lhunath commented 10 years ago

As for the documentation not being in-sync with the scopes I just mentioned, that's because they're currently still considered beta. The documentation will be updated as soon as the official iOS app is released.

I don't really understand your confusion with respect to the com.lyndir.masterpassword identifier. At no point is a com.lyndir ever transformed into com.lyndir.masterpassword. The only reference I see to "com.lyndir" by itself is the example telling you what the data types are, where the string is just an example string to show how strings are to be encoded.

sergeevabc commented 10 years ago

@lhunath On documentation: com.lyndir.masterpassword is already mentioned within The Master Key and The Template Seed paragraphs, however its purpose is omitted.(for example, I could click on Scrypt link and learn about N, r, p, dkLen, but I was able to grasp meaning of com.lyndir.masterpassword just after @tmthrgd informed of reverse domain notation practice to present unique identifier).

On algorithm: assuming it takes so long to brute-force just 1 quick iteration of hmac-sha256(64bytekey, sitename) then why bother with expensive Scrypt in the first place if we could get key easier by doing var 64bytekey = pbkdf2.hmac.sha256(masterpassword, username, 1, 64)?

lhunath commented 10 years ago

The purpose of scrypt is to derive a long (64-byte) key from a tiny memorable password. While brute-forcing a 64-byte key takes ages, brute-forcing a memorable password is trivial, especially at 12B/s. We use scrypt over pbkdf2 because it is memory-hard. The fact that scrypt is expensive is exactly the reason why I choose it: We need a 64-byte key derivation algorithm that takes a long time, but more importantly, one that takes JUST AS LONG for an attacker as it takes for a simple user. This last part is what scrypt is really good at; an attacker can't buy a cheap ASIC system with a mass of cheap GPU power and get a better key derivation performance. This property is what makes attempting to brute-force your master password so expensive (dollar-wise), because you need to buy a lot of RAM and CPU to be able to parallelize scrypt brute-forcing sufficiently. The scrypt paper illustrates this much better than I can in this comment.

tmthrgd commented 10 years ago

@sergeevabc It's all got to do with the ability to reverse the function. As HMAC, SHA-256 and to a lesser extent scrypt are all well, professionally designed one-way functions. As a consequence of their cryptographic properties the only* way to reliably find the input or key is to use a brute-force attack; trying one after another.

Statistically it is important less when p = 1 = 100% (certain) but more when p = 0.5 = 50% (fifty-fifty). In an appropriately designed cryptographic algorithm like these (non-asymmetric) this occurs when you've covered exactly half the key space.


The key derived from the master key is 64 bytes long, which is 64 × 8 = 512 bits. That gives a total of 2512 ≈ 1.3408×10154 possible keys (that is approximately 1 followed by 154 zeros). Considering when p = 0.5, we need to cover 2512 ÷ 2 = 2511 ≈ 6.7039×10153 keys (approximately 7 followed by 153 zeros).

The most important thing to note in brute-forcing large key spaces is not the time needed but the energy needed. The absolute minimum energy needed to do nothing more than flip a bit (to traverse the key space, completely neglecting the checking) is given by the Landauer limit implied by the laws of physics. The Landauer limit is kT × ln 2, "where k is the Boltzmann constant (approximately 1.38×10-23 J/K), T is the temperature of the circuit in kelvins, and ln 2 is the natural logarithm of 2 (approximately 0.69315)." T is considered to be room temperature of approximately 300 Kelvins.

For a 512-bit key we need (2511 - 1) × 1.38×10-23 × 300 × ln 2 ≈ 1.9238×10133 J (Joules) ≈ 1.9238×10133 × 2.78×10-7 kW⋅h ≈ 5.3481×10126 kW⋅h (Kilowatt-hours) of energy. Even in the largest SI/metric prefix (yotta, 1029) it is 5.3481×10105 YW⋅h.

To put that in perspective the global electricity production is 23,127,000 GW⋅h (gigawatt-hours). If you consumed all the worlds electricity, it would take 5.3481×10120 ÷ 23127000 ≈ 2.3125×10113 years worth of electricity. For perspective the age of the universe is (13.798 ± 0.037)×109 years. That's 2.3125×10113 ÷ (13.798×109) ≈ 1.6760×10103 times the age of the universe.

To put it in perspective another way, the sun outputs 384.6 yotta watts (3.846×1026 W) of energy per second. If you could capture every watt of energy the sun produced it would take 5.3481×10129 ÷ (3.846×1026) ≈ 1.3906×10103 seconds ≈ 1.3906×10103 ÷ 3600 ÷ 24 ÷ 365 years ≈ 4.4094×1095 years worth of solar energy. Which is 4.4094×1095 ÷ (13.798×109) ≈ 3.1957×1085 times the age of the universe.

That's only to reach a probability of 0.5, to be certain you'd need twice those figures.

The point here is that it's totally infeasible verging on physically impossible to brute-force a 512 bit pseudo-random key from the energy requirement alone.


Passwords are a different matter entirely, if we assume people constructed there password from just the Latin alphabet, then we have a possible 26 × 2 = 52 characters. For 6 characters there are 526 ≈ 1.9771×1010 possible passwords. But this isn't even strictly relevant, people do not pick passwords well or with random distribution.

If you brute-force a standard password you'll reach p = 0.5 with only a fraction of the number of required passwords. Totally feasible and within the range of modern computing. It is also possible to enumerate a list of common passwords, mixed with stolen password databases to cover a very large number of passwords. Because of the construction of Master Password it would be impractical to construct a dictionary to mount a dictionary attack.

The problem with passwords is that you can efficiently brute-force them very easily and with little memory. That's where scrypt comes in. Like all key derivation functions it is designed to take time, it is not designed to be fast like a standard hash algorithm. Slow algorithms on there own are sufficient to defeat a CPU brute-force but they can easily be beaten by a GPU, FPGA or ASIC brute-force. Now scrypt by design requires a large amount of memory to be efficient, this completely defeats the GPU, FPGA and ASIC brute-forces because the cost of memory is so prohibitive. Scrypt gives a time-memory trade off, you either commit to the enormous cost of fast memory or accept an extremely slow attack.

sergeevabc commented 10 years ago

I'd like to qualify the statement: Scrypt helps to produce 64 byte key from master password, then key is used with HMAC-SHA256, which reduces the output to 32 byte according to RFC4868, is that true?

tmthrgd commented 10 years ago

@sergeevabc loosely yes. It's something like this password -> scrypt -> HMAC-SHA256. The key derived using scrypt is used as the key for HMAC. The HMAC signs the data "com.lyndir.masterpassword" ∥ INT_32_BE(name length) ∥ name with the scrypt key.

tmthrgd commented 10 years ago

I've removed the requirement for the Web Crypto API, the Encoding API and Typed Arrays, see https://github.com/tmthrgd/mpw-js/commit/bb9bd41c6058ab612e113eea7e18db1ac729f150.

mpw-js should now work on all browsers. I'm not aware of anything preventing it working as far back as IE6.

tmthrgd commented 10 years ago

@lhunath If you could answer https://github.com/Lyndir/MasterPassword/issues/89#issuecomment-59642261 and https://github.com/Lyndir/MasterPassword/issues/89#issuecomment-59642737 for me that would be very much appreciated.

lhunath commented 10 years ago

@tmthrgd when the context is empty we exclude it in the interest of backwards compatibility.

And spaces in the template render as spaces in the output. I've implemented this not as a hardcoded exception but as a space character class which is just one character long.

Also: good news on mpw-js. Did this involve any sacrifices?

tmthrgd commented 10 years ago

@lhunath None whatsoever (except my aversion to including cryptographic algorithms written in javascript). You can access it at https://tmthrgd.github.io/mpw-js/.

sergeevabc commented 10 years ago

@tmthrgd Consider adding disposable email address type to get unique prefix for harakirimail.com, spamfinity.com. one-time.email (to name a few “disposable inbox” providers available via HTTPS).

E.g. to get f89289758a7c448345c87f7e5a795318d6e07ac5@spamfinity.com or even something shorter.

lhunath commented 10 years ago

I don't know how exactly these services work, but you could just use the generated username for your site as a prefix.

sergeevabc commented 10 years ago

@lhunath Disposable email services work almost the same: when you register on untrusted site, you're required to input your e-mail address to receive a verification letter and some marketing spam later on, but instead of disclosing true address it might be wiser to input imaginary one. Then you visit disposable service site to access the inbox. Incoming letters are removed within short period of time (from 10 minutes to 3 days usually), however the inbox is not password-protected, so anyone could access it, hence it's better to make prefix unique. To serve the purpose I use HMAC-RIPEMD160 now.

lhunath commented 10 years ago

spamfinity.com does support password protected inboxes; so use [site username].[site name]+[spamfinity password]@spamfinity.com

eg.

$ mpw -u 'Robert Lee Mitchell' -v l amazon.com
Your master password:
zuyhupovi
$ mpw -u 'Robert Lee Mitchell' spamfinity.com
Your master password:
Taxm0+TibbNila

Now use zuyhupovi.amazon.com+Taxm0+TibbNila@spamfinity.com

sergeevabc commented 10 years ago

@tmthrgd @lhunath Consider adding donate links below your in-depth replies, one could learn a lot here. :) That revelation about energy needed to brute-force is a knock-out, sci-fi megastructures loom…

sergeevabc commented 10 years ago

What's the point of separate conversion namelength and sitenamelength to 32-bit unsigned integer instead of salt=stringToUTF8Array("com.lyndir.masterpassword"+name.length+name)?

function stringToUTF8Array(s) {
    var i,
        d = unescape(encodeURIComponent(s)),
        b = new Uint8Array(d.length);
    for (i = 0; i < d.length; i++) b[i] = d.charCodeAt(i);
    return b;
}
tmthrgd commented 10 years ago

@sergeevabc well first and for most they do very distinct things. (In JS:) "com.lyndir.masterpassword" + name.length + name starts with the string "com.lyndir.masterpassword", it then converts name.length from an integer into a string (so 4 -> "4") and joins it, it then adds the string name. Your stringToUTF8Array then converts this into a byte array with the possibility of many errors and instead the Encoding API should be used. In mpw.js I convert the string "com.lyndir.masterpassword" into a byte array before concatenating it with the binary integer name.length, and the name converted into a byte array.

Aside from being more 'proper' this has a real advantage. A 32-bit unsigned integer can hold 232 = 4294967296 distinct values, from 0 to 232 - 1 = 4294967295. Being 32-bits it takes up exactly 4 bytes, always. A string is really a variable length character array (good ol' c). So to store the random integer 3205018717 as a 32-bit unsigned integer in big-endian form gives the byte sequence [ 191, 8, 180, 93 ].

To store the same random integer as a string is very different. First the integer is converted into a string so 3205018717 becomes "3205018717", which is a collection of characters. Then the string is converted into a very different byte array: [ 51, 50, 48, 53, 48, 49, 56, 55, 49, 55 ]. This byte array does not represent the integer but the characters that the individual digits map to. So it maps the digits in the number into UTF-8 characters between 0 (48) and 9 (57). One of the many problems in using this method is that the result varies in length from a single byte for 0 ≤ x < 10 to ten bytes for 4294967295. Another issue is that it does not enforce or respect the type. If length was to be set to 78.475, a floating point decimal, it would become "78.475" and then [ 55, 56, 46, 52, 55, 53 ](note 46 is the period .).

@sergeevabc thanks, I have to say the energy aspect is my favourite way to look at brute-forcing. It's one of the hardest issues to overcome and totally left-field for most.

tmthrgd commented 10 years ago

@sergeevabc would adding an option for custom template strings be acceptable? (@lhunath)

I considered adding this ability to mpw.js but decided against it to keep as close to the spec as possible but would certainly consider adding it, if there is a sufficient use case.

lhunath commented 10 years ago

Custom templates is one of those things asked for once in a while, and each time I've rejected the notion. One of the primary benefits of being a stateless password solution is the fact that you can get away with remembering nothing but your master password and all else can disappear; from that one thing, you can always recover your passwords later.

If we're going to start introducing things like custom templates, people are going to ignorantly start using them. That'll lead to situations where everyone is happy until they somehow lose state or forget their custom template for site X.

There really isn't a true need for such a thing either; and I'd prefer to keep MP simple, and keep people from doing stupid things they'll regret later. Master Password is a stateless solution, let's keep it that way.

Note that I'm not averted to introducing new standard templates, so long as we can support their general usefulness. In fact, I'm considering ways to provide a better way to gradually increase template complexity in a stateless-friendly way. This should probably be a separate issue.

WRT. encoding numbers as UTF-8 characters of their decimal representation and then back into bytes; that sounds pretty silly indeed. Never mind that fixed-field size values effectively act as a delimiter in the final byte stream, which can be effective at defeating injection and length extension attacks.

lhunath commented 10 years ago

It's not exactly a single-file implementation, but here's a start:

http://masterpasswordapp.com/mpw-js/

It doesn't appear to work yet in IE and Chrome wants it to be on HTTPS, which is going to be a bit of a challenge by itself. But it's a start.