IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 490 forks source link

Sensitive Data: Complex Passwords #3150

Closed pdurbin closed 7 years ago

pdurbin commented 8 years ago

Enforcing complex passwords is a requirement for storing level 3 data as defined by http://policy.security.harvard.edu/level-3 and the following text comes from http://policy.security.harvard.edu/sa2-complex-passwords as of 2016-06-01:

SA2: Servers and applications that manage passwords must force the setting of a complex password. This must meet the following requirements:

No common names or dictionary words

Include at least one character from at least 3 of these categories:

  • Uppercase letter
  • Lowercase letter
  • Number
  • Special character

Any one of these three length and additional requirements:

10 characters minimum 8 characters minimum and annual password reset/expiration 8 characters minimum and a second authentication factor

On 2016-05-25, we were advised that "The upcoming changes to the password policy will only include a relaxation of complexity rules for passwords of 20 characters or more. Everything else will remain the same." We expect these changes to be published at the URL above by 2016-07-01. The spirit of this change is in line with XKCD cartoon from https://xkcd.com/936/ in which the password is "correct horse battery staple":

password_strength

Again, for passwords of 19 characters or fewer, the requirements above must be applied.

All of these requirements are optional if a particular Dataverse installation has not been put into some sort of "secure" mode such that we plan to put https://dataverse.harvard.edu in before we permit level 3 data to be uploaded. That is to say, the existing password complexity rules should remain (minus the bug reported in pull request #3095) out of the box. Ideally there would be a switch called "level 3" or something to enforce all of the rules above but it would be nice if each Dataverse installation had control over each of the rules separately.

We have control over password complexity for local accounts within Dataverse (stored using bcrypt per #1034 in the builtinuser table) but for Shibboleth accounts we must trust that each Identity Provider enforces decent passwords. Membership with a federation such as InCommon should help in this regard and in Harvard's case http://iam.harvard.edu/resources/incommon-bronze details how passwords are handed in Harvard Identity Provider (IdP):

Harvard’s common identity management system enforces the use of complex passwords of at least 10 characters in length. The password character set includes uppercase and lowercase alphabetic, numeric, and the suite of special characters. The complexity requirement is for at least one uppercase, one lowercase, and one non-alphabetic character in each password. Candidate passwords are also checked against a large dictionary and against directory information about the individual. Both CAS and Active Directory lock a user out for at least 30 minutes after 10 bad password guesses. The University does not require regular password changes, but the University IdP will ensure that passwords used to authenticate individuals for InCommon Bronze SPs have been changed within the last five years, in essence enforcing a five-year password reset requirement that impacts only users of InCommon Bronze SPs. With these conditions, Harvard meets Bronze requirements for basic resistance to guessing. See the appendix for calculations showing the level of guessing resistance.

The page for Harvard's Identity Provider (IdP) above links to https://www.incommon.org/assurance/ and from the Dataverse perspective it's our understanding that all InCommon federation members have sufficient password complexity rules to meet level 3 requirements. In short, there's nothing to do code-wise within Dataverse with regard to passwords that are not stored in the database used by a Dataverse installation. These passwords are stored in each institution's Identity Provider (IdP).

pdurbin commented 8 years ago

@lwo was recently looking at this part of the code in his pull request #3095 and because no good deed should go unpunished (kidding!) I pinged him to see if he'd like to take a swing at this issue. He agreed so I assigned it to him but I'll be keeping an eye on this issue as well. @lwo if you'd like me to start a requirements doc for this issue, please just say the word.

To be clear, implementing two factor authentication should be considered out of scope for this issue. To avoid that requirement, all we have to do is make sure passwords are 10 characters in length or greater, when the installation is flipped to "secure mode" or "sensitive data" mode. (Let's not call it "level 3 mode", which is way too Harvard-specific.)

pdurbin commented 8 years ago

Today @lwo met over Google Hangouts to discuss this issue. We took a few notes at https://docs.google.com/document/d/1Cq6ud-yydSPxRkTAlp_IS_5iCbBDNNk4goe7Yxp3T5A/edit?usp=sharing

I hadn't really considered that "No common names or dictionary words" necessitates having access to a dictionary! The plan is to ship one with Dataverse, to bundle it with the app, rather than relying on some operating system-specific file such as /usr/share/dict/words on Linux. @lwo has ideas on where to find one.

It would be nice to have some more clarity at http://policy.security.harvard.edu/sa2-complex-passwords or elsewhere about the upcoming changes having to do with 20 characters. We're still using this for guidance: "The upcoming changes to the password policy will only include a relaxation of complexity rules for passwords of 20 characters or more. Everything else will remain the same." Anyway, the plan is to have a flag/setting (one of many stored in the setting table) to turn this on or off.

@lwo will let me know if he has any more questions. Thanks for working on this @lwo!

lwo commented 8 years ago

Hello @djbrooke, I better ask you, as @pdurbin should be on his break. For this I like to use a well documented third party library http://www.passay.org/ which is dual licensed under both the LGPL and Apache 2. Can you confirm I can use this maven dependency to tackle this issue? Without response, I will work under the assumption that this is ok.

pdurbin commented 8 years ago

Heh. @djbrooke and I are both back now. Passay looks fine to me, from a quick glance. The website says it's built on vt-password and you may have noticed that in code (PasswordValidator.java) I had already suggested switching to this. Yes, the license looks fine and you are welcome to allow Maven to resolve dependencies. Thanks @lwo!

lwo commented 8 years ago

Hmm... those password lists... I am still unsure about the technical translation from the "No common names or dictionary words" requirement. For example: "Password" and "password123" are commonly used passwords, and we could take a dictionary list from: http://weakpass.com/ to block them. But then, those examples any the large majority of those dictionary words would not pass the validator anyway ( no caps, digits, etc ).

Now "P@$$w0rd" is not on that list and considered valid... but it is a weak password because the character substitution is an open door cipher. It makes more sense to me if we block that. If this is considered the problem, then I suppose we could add to #3227 a stripped down version of the rule based algorithm described here: https://labs.mwrinfosecurity.com/blog/a-practical-guide-to-cracking-password-hashes and use it to invalidate common names or dictionary words plus those variations of them that are the function of a simple character substitution cipher. But we should investigate how expensive that is.

pdurbin commented 8 years ago

Per @djbrooke the Sensitive Data project is being deferred until Dataverse 5.0 so I'm unassigning myself since I like having https://github.com/IQSS/dataverse/issues/assigned/pdurbin reflect what I'm actually working on. I'll also add the 5.0 milestone.

@lwo as we've discussed, next week we'll at least touch base about the excellent work you've done to address this issue in pull request #3227 but you don't seem to be in any hurry to see this issue addressed for own installation of Dataverse. I appreciate your flexibility on the timing!

oscardssmith commented 7 years ago

Looking at the code, are we making users change passwords regularly? If so, we shouldn't be. Also with regards to dictionary words, The easy answer is to take the lowercase version of the password, turn symbols to their likely letter versions (@->a, 3->e), and check substrings against a dictionary with 1 and 2 letter words removed.

pdurbin commented 7 years ago

Sensitive data work isn't happening until Dataverse 5.0. We'll revisit this then. Closing. Thanks for your input, @oscardssmith . :smile:

pdurbin commented 7 years ago

It would be nice to have some more clarity at http://policy.security.harvard.edu/sa2-complex-passwords or elsewhere about the upcoming changes having to do with 20 characters.

The policy has been updated and now includes lot of language about 20 characters. Here's how it looks as of today (2017-08-09):

SA2: Servers and applications that manage passwords must force the setting of a complex password. This must meet the following requirements where technically feasible (consult the Security office if the following requirements are not technically feasible):

Use HarvardKey for authentication

OR:

Passwords of more than 20 characters in length

OR:

Passwords 20 characters or fewer in length with the following requirements:
    No common names or dictionary words
    No sequences of more than 4 digits in a row
    Include at least one character from at least 3 of these categories:
        Uppercase letter
        Lowercase letter
        Digits
        Special character
    Password reset/expiration period as follows:
        10-20 characters = no periodic reset/expiration required
        8-9 characters plus a second authentication factor = no periodic reset/expiration required
        8-9 characters only = annual password reset/expiration required

Thankfully, we had a heads up on this change and let @lwo know about it before he opened pull request #3227 which I think more or less meets all these requirements.

dlmurphy commented 7 years ago

Current account sign up screen on Harvard Dataverse (ignore the icons inside the boxes, that's from my password manager): passwordlogin

I think the main UI implications here are just messaging - the explanatory text above the Password box and the error messages when you input a password that doesn't match the requirements. Once we come to a final conclusion on exactly which requirements to implement, I can start writing this messaging.

dlmurphy commented 7 years ago

Example of what the new text above the Password bar could look like: "Passwords must be 10-20 characters long and must include at least one uppercase letter, one lower case letter, and one digit or special character."

oscardssmith commented 7 years ago

I think that is mostly done (with user configurable rules). the current rule is 8-9 char: 3 of Upper, lower, number, symbol. not in dictionary. Must be changed every year 10-19 char: 3 of Upper, lower, number, symbol. not in dictionary. 20-255 char: no restrictions.

djbrooke commented 7 years ago

Moving this back to not started for now. Thanks @oscardssmith for the work on this before you left!

dlmurphy commented 7 years ago

Just talked with @djbrooke and @pdurbin about this issue. Our plan is: Phil is going to determine some good default out-of-the-box password requirements we can set up.

For Harvard Dataverse to support sensitive data, we're going to set it up to use the following parameters:

oscardssmith commented 7 years ago

I quite like this approach. Not using 8 or 9 chat passwords will simplify the implementation a lot

dlmurphy commented 7 years ago

Been giving some thought to this issue. Here's a mockup I made showing new messaging implemented with the minimum possible change to the UI: passwordmockup The text, for ease of copying + pasting:

Password must be at least 10 characters long, must contain no dictionary words, must not contain a sequence of more than 4 numbers in a row, and must include characters from 3 the following categories: uppercase letter / lowercase letter / number / symbol. Passwords over 20 characters in length are exempt from all requirements.

I think that this is not really the most intuitive way to present this info. It'd be easier for a user to quickly grok these various requirements if they were presented the form of a list, but that might require more substantial changes to the page than we want to make. Can that "i" string of info text support a bulleted list? Would that look OK, or would it be a mess? I'm interested in @TaniaSchlatter and @mheppler 's thoughts on this.

@matthew-a-dunlap helped me find a very instructive example of someone else doing this same thing - HarvardKey has implemented very similar password requirements, and their messaging is fantastic. Here's what their "change password" page looks like: harvardkey

Note that they present the info in the form of a bulleted list, with dynamically changing bullets indicating whether the requirements have been reached. The info is presented in a little box that separates positive requirements from negative requirements. I wonder if we can model something similar to this.

P.S. ignore the little symbols in the password boxes in these screenshots, those are from my password manager.

pdurbin commented 7 years ago

Thanks for all the input so far. If anyone wants to hack on this issue, please use the 3150-complex-passwords branch at https://github.com/IQSS/dataverse/tree/3150-complex-passwords

pdurbin commented 7 years ago

In 4b0b17b we added some static help text about password rules that looks great (screenshot below) but needs to be made dynamic. In the error message below it, you can see the reasons why they password "qwertyuio" was rejected based on the out-of-the-box rules as of that commit:

account_-_root_-_2017-08-18_11 39 23

From a scope and product point of view, especially now that I added some content to the Installation Guide in eba5e87, is whether or not we should ship the :PVExpirationDays and :PVValidatorExpirationMaxLength settings and related code. Here are the descriptions of those settings:

The reason I'm thinking we could perhaps jettison those settings and related code is that @dlmurphy doesn't think we should use them in Harvard Dataverse and I don't know if other installations of Dataverse require them or not. If we ship these settings and code, we need to consider the UI/UX impact of passwords expiring. Do we email users periodically as the date approaches that their password expires? More generally, what is the user experience for one's password expiring?

The other thing that's on my mind isn't a question but an assertion. Yesterday @djbrooke said I'm free to make a decision about what the out-of-the-box password complexity settings should be. I played around with new CharacterRule(EnglishCharacterData.Digit, 1) and related code and I think with a small change I can preserve the existing password complexity rules we've used since Dataverse 4.0, expressed in the UI as a "minimum six characters long and uses at least one letter and number". In the User Guide I would document recommended settings that would be in compliance with https://policy.security.harvard.edu/sa2-complex-passwords and would have a password length long enough that changing the password every year wouldn't be required (to make @dlmurphy happy πŸ˜„ ).

Once this code is deployed, it's my understanding that some people on the team want to force all users to change their passwords. I don't think this is necessary if users' passwords are already complex enough to meet the requirements. I haven't tested any of the "force password change" logic yet but I will and will know more after I do.

Other thoughts as they come to me. I think by writing but we're going to meet about this soon. πŸ˜„

dlmurphy commented 7 years ago

Messaging for Harvard Dataverse, above the password box on the account creation screen:

Your password must contain: -At least 10 characters (passwords of at least 20 characters are exempt from all other requirements) -At least 3 of the following: uppercase, lowercase, numeric, or special characters

It may not include: -Number sequences of 4 or more numbers in a row -Dictionary words or common acronyms of 5 or more letters

pdurbin commented 7 years ago

Great meeting with @djbrooke @scolapasta @dlmurphy and @bsilverstein and here's the todo list:

pdurbin commented 7 years ago

@dlmurphy @mheppler I made the messaging dynamic again. Here's how the out-of-the-box messaging looks as of c34fdad if I try using a period as my password. I'm keeping the out-of-the-box configuration the same that we've had since Dataverse 4.0, which is requiring one letter and one number:

screen shot 2017-08-22 at 2 35 04 pm

@mheppler I need to move the English test to the bundle but I figured we'd nail down the UI/UX first. I'm creating the ul list on the backend.

TaniaSchlatter commented 7 years ago

The approach in the example above (noting what's missing/wrong) works if you also tell the user what criteria the password met.

If we can't do that, I suggest a simpler solution – telling the user that their PW does not meet the criteria, and asking them to review the criteria and try again.

dlmurphy commented 7 years ago

From chatting with Tania about this, the easiest way to implement this might be to make the bullet points on the password criteria dynamically reflect whether each criteria has been met, similar to how HarvardKey does it in this example:

29434864-aea87bee-8372-11e7-91ef-47be5d722325

pdurbin commented 7 years ago

@dlmurphy do you mean easiest for the user or easiest for the developer? πŸ˜„ Don't answer that. I can guess what you mean. πŸ˜„

pdurbin commented 7 years ago

@bsilverstein and I just talked about what's left for this issue and here's the current todo list:

bsilverstein95 commented 7 years ago

For what it's worth, the repeating characters -> repeating digits change comes from an oversight by myself and @pdurbin where we misconstrued part of the Harvard complex password requirements which reads:

No sequences of more than 4 digits in a row

I sort of rushed into developing what I thought the rule should be, which is a configurable rule limiting the amount of repeating characters in a password. Fortunately this was in the passay.org library and wasn't too tough to implement, but it doesn't stop sequences of four or more digits in a row e.g.: wronghorse8080808 would pass.

Currently I'm trying some of the other rules from http://www.passay.org/reference/ that are easy to access but there's none that look like a bullseye for this requirement. Not really looking to introduce any new libraries nor reinvent the wheel, so any ideas for this that rope in passay rules would be helpful!

matthew-a-dunlap commented 7 years ago

From discussing(/reminiscing) with folks @ Harvard Key, they only enforce a dictionary on words 5 letters or longer. We don't want to be blocking folks from using the word "a" or "an". (EDIT: I see now that's already how its designed)

pdurbin commented 7 years ago

or common acronyms of 5 or more letters

@dlmurphy this isn't in https://policy.security.harvard.edu/sa2-complex-passwords which says "No common names or dictionary words" so I'm changing it to what the policy says. It's all going to depend on what dictionary file you choose. For now, I'm testing with the the standard /usr/share/dict/words file on Linux.

pdurbin commented 7 years ago

I promised @TaniaSchlatter and @dlmurphy screenshots and I'm going to show below the sign up page as of 96c1c43 with two configurations:

Out-of-the-box password rules for when you install Dataverse (six characters, one letter, one number)

screen shot 2017-08-28 at 10 31 34 am

Proposed configuration for Harvard Dataverse for level 3 data

screen shot 2017-08-28 at 10 31 40 am

As I mentioned to @dlmurphy next week, I'm planning on keeping two servers up to date with the code as I work on it so that we can all play around with the password rules and feel at a gut level what they feel like. Here are the two servers I'm using:

pdurbin commented 7 years ago

Here's the current todo list:

matthew-a-dunlap commented 7 years ago

The dictionary provided to dataverse will need to be stripped of words less than 4 characters in length, otherwise it with error on matches of shorter words. This command generates a compliant dictionary from the linux words file: cat /usr/share/dict/words | grep -x '.\{4,20\}' > ~/wordsGt3.txt

matthew-a-dunlap commented 7 years ago

Attached is the password reset page ( passwordreset.xhtml ) with the red/green feedback @TaniaSchlatter @dlmurphy

screen shot 2017-08-30 at 11 08 50 am

matthew-a-dunlap commented 7 years ago

From the checklist, "Make getRequiredCharacters in PasswordValidatorUtil dynamic" requires upgrading to a later version of the library, as otherwise one cannot get the names of the character rules stored.

I have tested the upgrade and it changes the text/number of errors thrown by the library, so it may not be worth the effort.

djbrooke commented 7 years ago

Note from sprint planning:

matthew-a-dunlap commented 7 years ago

I think I have the fix in place for making pvcharacter rules configurable on the fly, but a number of the tests in testDifferentPasswordsAndSettings are now blowing up. At least some of those tests look wrong themselves. Will investigate more tomorrow.

matthew-a-dunlap commented 7 years ago

See pull request https://github.com/IQSS/dataverse/pull/4128

pdurbin commented 7 years ago

@matthew-a-dunlap and I have been discussing a couple bugs as of 92af18a, best illustrated by a screenshot:

screen shot 2017-09-08 at 3 56 18 pm

pdurbin commented 7 years ago

Todo list as of 34d3beb

  Invalid value for PwMaxLength: null
  Invalid value for PwMinLength: null
  Invalid value for PVNumberOfCharacteristics: null
  Invalid value for :PVNumberOfConsecutiveDigitsAllowed: null
  PwDictionaries not set and no default password file found: weak_passwords.txt
  Dictionary was set, but none was read in.
  Invalid value for PVNumberOfCharacteristics: null
  Invalid value for PVGoodStrength: null
  Invalid value for PwMinLength: null
dlmurphy commented 7 years ago

I just poked around at a working demo of this, and from a design standpoint I think it's good to go. It's a massive improvement over what we started with.

While talking with @matthew-a-dunlap about this issue, we thought of something that would improve the design even further, but we're opting not to implement it right now in order to avoid scope creep on an issue that has already been more complex than expected. But I want to note it here for posterity, as it might be worth pursuing in the future.

When one of the password requirements is "No dictionary words" and a user's password fails on this rule, it would be useful for us to tell the user what word it failed on.

For example:

If we told the user that the password failed because it included the word "Bose" then they'd more easily be able to adjust their password to make it work.

HarvardKey does this; here's a screenshot Matthew snagged:

screen shot 2017-09-12 at 5 34 36 pm

Again, this is a "nice to have" and not essential, so let's move on for now and consider coming back to this some day in the future.

pdurbin commented 7 years ago

I'm not sure I love the idea of part of my password being shown to me in the the clear. There's a reason password input fields show asterisks when you type.

matthew-a-dunlap commented 7 years ago

Some testing notes: Parts to test:

Other notes:

kcondon commented 7 years ago

Issues found:

kcondon commented 7 years ago

Tested all rules, 3 pages, config settings, script.

pdurbin commented 7 years ago

I reduced logging in e9fb520. I also merged the latest from develop (just the addition of ISSUE_TEMPLATE.md) and re-enabled a related API test having to do with password complexity.

kcondon commented 7 years ago

Tania asked for screenshots of the final implementation. There are 3 pages: 1. sign up, 2. forced password change, 3. account info change password, and 1 view of sign up page where some fields were entered correctly and some incorrectly. screen shot 2017-09-26 at 5 45 52 pm screen shot 2017-09-26 at 5 44 19 pm screen shot 2017-09-26 at 5 44 58 pm screen shot 2017-09-26 at 5 45 36 pm