e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 32 forks source link

Script to configure cognito automatically #1008

Open shankari opened 8 months ago

shankari commented 8 months ago

Context: The NREL hosted admin dashboard uses AWS cognito for authentication Right now, when we configure a new dynamic config (https://github.com/e-mission/nrel-openpath-deploy-configs) for a new program, @shankari manually goes in and adds the logins, creates temporary passwords, and then sends out an email with instructions from her personal email.

The first-level goal is to have a script to automate this. The script should take a config file as input. It should read the admin_users or similar tag from the config file JSON It should "sync" the users in the file with the users in cognito:

For the users that are added, we should have cognito autogenerate a temporary password and send it to the users' email along with standard instructions (use 2FA....). The email should come from openpath@nrel.gov instead of verificationemail.com.

shankari commented 8 months ago

This code should live in the nrel-openpath-deploy-configs repo since we will eventually want to automate this when the config changes by using GitHub Actions ๐Ÿ˜„

nataliejschultz commented 8 months ago

Now that the nominatim project is mostly squared away, I've been gaining some background on how to get this email project started.

We'll have to create an identity for openpath@nrel.gov on the AWS account. I tried to create this identity, and immediately ran into permissions issues:

You do not have sufficient access to perform this action.
User: ...nschultz@nrel.gov is not authorized to perform: ses:TagResource on resource: ...identity/openpath@nrel.gov because no identity-based policy allows the ses:TagResource action

I replaced numbers with ### because I'm not yet familiar enough to know which numbers could represent sensitive information, but you get the gist. I'm going to reach out to Jianli and see if there's an easy way to add these permissions.

nataliejschultz commented 8 months ago

I've gotten boto3 working with the staging account, and was able to add my NREL email as a verified email. I then tried to send myself an email using the outline provided here.

However, when I sent the email, I got an undeliverable DMARC authentication error. I can't see the verified emails in the console, but I think I have to change the DMARC policy somehow so that it sees @nrel.gov emails as matching the domain. I haven't looked into it thoroughly but I'm going to learn more tomorrow.

nataliejschultz commented 8 months ago

Jianli helped me figure out the DMARC issues. I successfully sent a test email from openpath@nrel.gov using amazon sdk (boto3) to an account that is not a verified identity, and there were no domain verification issues.

I've also put together a small script to extract the email addresses from the config files, using a bit of RegEx code that I found:

with open ('localpath/file.nrel-op.json') as config_file:
    data = json.load(config_file)
    intro = data['intro']
    info = intro['program_admin_contact']
    match = re.search(r'[\w\.-]+@[\w\.-]+\.\w+', info)
    email = match.group(0)
    print(email)

I've tested it on a few of the config files and it works really well, as long as the user puts in the email correctly.

shankari commented 8 months ago

I've also put together a small script to extract the email addresses from the config files, using a bit of RegEx code that I found:

the admin contact is typically one person who is the "public face" of the project. But the list of admin users is typically much longer, especially if they have interns or grad students to help manage the project over time.

So the appendix to the MOU allows people to list 3 admin users.

It should read the admin_users or similar tag from the config file JSON

So you should add a new tag that lists all of the users and read that instead of the program_admin_contact And you don't need to do any fancy parsing for that because we can just expect a list of email addresses.

To test, you would need to create a new JSON file with the new field manually added and populated with test email addresses.

shankari commented 8 months ago

To clarify further, we may want to customize the email that is sent depending on other aspects from the config file. So the config has an admin-dashboard section with some screens enabled and disabled, and some columns excluded sometimes.

So for research partners, for example, we will show them the full data in the trip and trajectory tables (e.g. uue) For public agency partners, we will have trips linked to users in the trip table but without locations, and in the trajectory table, we will show the locations, but not the user id to ensure anonymity (e.g. ride2own)

so in the email that I forwarded to you, we highlight that we don't show spatial information in the trip table... That text should be removed when sending to partners where the admin-dashboard does not exclude the spatial tables

shankari commented 8 months ago

There are multiple config files. The config file name or the serverURL field in the config should tell you which config it is. You should then use that config name to map to the user pool

e.g. https://github.com/e-mission/nrel-openpath-deploy-configs/blob/main/configs/ebikethere-garfield-county.nrel-op.json has a server URL of ebikethere-garfield-county-openpath.nrel.gov, and maps to a user pool of nrelopenpath-prod-ebikethere-garfield-county

You can check the other config files as well and their mappings to the other user pools

nataliejschultz commented 8 months ago

I've been working on my script and have a few updates + questions. Here is the state of the script:

Extract email addresses

def email_extract():
    with open ('path/to.nrel-op.json') as config_file:
        data = json.load(config_file)
        intro = data['intro']
        emails = [i.strip() for i in intro['admin_users'].split(",")]
    return emails

Get name of user pool and check if it exists

filename = "name-of-file"
pool_test = "nrelopenpath-prod-" + filename
pool_id = []
AWS_REGION = "us-west-2" 

client = boto3.client(
'cognito-idp',
aws_access_key_id="",
aws_secret_access_key= "",
aws_session_token="", 
region_name=AWS_REGION
)
response = client.list_user_pools(MaxResults=60)
for i in response["UserPools"]:
    if i["Name"] == pool_test:
        print(pool_test + " pool exist!")
        pool_id.append(i["Id"])
        break

    else:
        print("Pool DNE! Looking...")
        continue

Check for user emails in user pool


def user_already_exists(pool_id, email):
    try:
        response = client.list_users(UserPoolId=pool_id)
        users = response["Users"]
        result = False
        for i in users:
            for k, v in i.items():
                if k == "Attributes":
                    for j in v:
                        if j["Name"] == "email":
                            user_email = str(j["Value"])
                            if str(email) == user_email:
                                result = True

        return result
    except ClientError as err:
        logger.error(
            "Couldn't list users for %s. Here's why: %s: %s",
            pool_id,
            err.response["Error"]["Code"],
            err.response["Error"]["Message"],
        )
        raise

for email in email_extract():
    if not user_already_exists(pool_id[0], email):
        print(email + " not in user pool!")

The next part is adding the prompting for signup to the SES portion of the script. I think it is getting close to done.

Questions How will the pulling of the AWS credentials work in production? Is there a reference I can use?

How to pull the name of the file after it's merged? I believe this will be based on @Abby-Wheelis 's GitHub actions workflow/where the config file ends up.

shankari commented 8 months ago

How will the pulling of the AWS credentials work in production? Is there a reference I can use? How to pull the name of the file after it's merged? I believe this will be based on @Abby-Wheelis 's GitHub actions workflow/where the config file ends up.

As you can see from https://github.com/e-mission/e-mission-docs/issues/1008#issue-1935857136

The first-level goal is to have a script to automate this. The script should take a config file as input.

You don't have a dependency on @Abby-Wheelis's GitHub action. Integrating the two is the next step.

Here is the state of the script:

This would be perfect to put into a draft PR so I can comment on it

nataliejschultz commented 7 months ago

I ran my script using my nrel email to sign myself up for the staging pool, and have found out a few different things:

  1. The AWS user pool must be configured to send email with SES if it is to come from a custom FROM address. This is a setting that can be changed on creation of the pool.
  2. The invitation message template is somewhat customizable, though to truly customize the email (ie integrate it with the script and include custom information), we would need to use AWS Lambda. Whoever sets up the pool has to set up the lambda function, and then set the user pool to allow the lambda trigger to run instead of the automatic verification email. I do not currently have access to AWS lambda functions, and cannot test this to see if it's possible.

However, we can much more easily automate sending an email to users using just SES separately from cognito. I think the most straightforward route is to send two emails: One from AWS with the username and temp password, and another highly customizable email with the URL to the admin dashboard, custom info based on the config file, and a notice that their temporary password will be in a separate email. It is up to @shankari what to do, since the cloud team wants to set up the user pools with CDK.

shankari commented 7 months ago

The AWS user pool must be configured to send email with SES if it is to come from a custom FROM address. This is a setting that can be changed on creation of the pool.

Correct. This is why we set up SES in the first place.

This is a setting that can be changed on creation of the pool.

Are you sure it is not changeable after the pool is created? Per https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-pool-updating.html you can change almost all the existing settings using UpdateUserPool

The invitation message template is somewhat customizable, though to truly customize the email (ie integrate it with the script and include custom information), we would need to use AWS Lambda.

I don't understand this. What custom information do we need? From the UI, I can create an email message template and have it be used when sending out the email. I assume that is the "somewhat customizable". I would start with that and document what (if anything) cannot be done.

I think the most straightforward route is to send two emails:

We may have to fall back to this, but it is clearly a sub-optimal solution. Any solution that requires the user to open two separate emails instead of one adds one more failure point. Can you please clarify why the default "customizable" option does not work, with supporting links and examples along with outputs?

nataliejschultz commented 7 months ago

you can change almost all the existing settings using UpdateUserPool

You are right!!! This was the missing puzzle piece that I had searched for but could not find. I'm going to play around with UpdateUserPool and see what its limitations are ๐Ÿ˜„

nataliejschultz commented 7 months ago

Okay, I've made a LOT of changes to the script since finding out about UpdateUserPool. Currently working on organizing all of the functions properly + calling them in the right order so that all variables can get passed in properly. Once that's done, I can push a commit to my pull request and move to ready for review!

nataliejschultz commented 7 months ago

I put the script into ready for review on thursday around midnight (a few hours short of my goal ๐Ÿ˜ž). Here is an in-depth overview of the functionality of the script:

Setup

#The only requirement to run is putting the name of the config file in as an argument to command line when running. 
#The name of the file is used to create the name of the pool, which is used to find the pool ID. 
filename_raw = sys.argv[1]
filename = filename_raw.split(".")[0]
pool_name = "nrelopenpath-prod-" + filename

#Gets a relative path to the config file, so that it can be opened and read
current_path = os.path.dirname(__file__)
config_path = os.path.relpath('../configs/'+ filename_raw, current_path)

#Set up AWS credentials as environment variables + set variables
ACCESS = os.environ.get("AWS_ACCESS_KEY_ID")
SECRET = os.environ.get("AWS_SECRET_ACCESS_KEY")
TOKEN = os.environ.get("AWS_SESSION_TOKEN")
AWS_REGION = "us-west-2" 

#Set up clients
cognito_client = boto3.client(
    'cognito-idp',
    aws_access_key_id = ACCESS,
    aws_secret_access_key= SECRET,
    aws_session_token=TOKEN, 
    region_name=AWS_REGION
    )
sts_client = boto3.client("sts")

Functions

Get_userpool_name passes in the pool_name and cognito_client to check for the existence of the user pool.

#This function has two options, based on @shankari 's preferences for how to run it.

def get_userpool_name(pool_name, cognito_client):
    response = cognito_client.list_user_pools(MaxResults=60)
    UserPoolExist = False
    #########One option to set the user pool without breaking (but still stop when condition met)
    i = 0
    while response["UserPools"][i]["Name"] != pool_name and i < len(response["UserPools"]) - 1:
        print("looking for user pool...")
        i = i + 1
        if response["UserPools"][i]["Name"] == pool_name:
            UserPoolExist = True
            pool_id = response["UserPools"][i]["Id"]
            print(pool_name + " pool exists! Checking for users...")
    #########Second option that uses a break when condition is met:
    # for i in response["UserPools"]:
    #     if i["Name"] == pool_name and not UserPoolExist:
    #         pool_id = i["Id"]
    #         UserPoolExist = True
    #         print(pool_name + " pool exists! Checking for users...")
    #         break 
    #     else:
    #         print("Looking for pool...")
    #         continue

    return UserPoolExist, pool_id

User_already_exists passes in the pool id and user email to check for the existence of the user in the user pool by email. Sets Bool to true if user exists. Raises error if list_users doesn't work.

def user_already_exists(pool_id, email, cognito_client):
    try:
        response = cognito_client.list_users(UserPoolId=pool_id)
        users = response["Users"]
        result = False
        for i in users:
            for k, v in i.items():
                if k == "Attributes":
                    for j in v:
                        if j["Name"] == "email":
                            user_email = str(j["Value"])
                            if str(email) == user_email:
                               result = True
        return result
    except ClientError as err:
        logger.error(
            "Couldn't list users for %s. Here's why: %s: %s",
            pool_id,
            err.response["Error"]["Code"],
            err.response["Error"]["Message"],
        )
        raise

Get_verified_arn uses get_caller_identity to get the account number, which is used to build the identity arn for the verified email address (ie FROM address)

def get_verified_arn(sts_client):
    account_num = sts_client.get_caller_identity()["Account"]
    identity_arn = "arn:aws:ses:" + AWS_REGION + ":" + account_num + ":identity/openpath@nrel.gov"
    return identity_arn

Email_extract opens the config file and returns a list of the email addresses.

def email_extract():
    with open (config_path) as config_file:
        data = json.load(config_file)
        intro = data['intro']
        emails = [i.strip() for i in intro['admin_users'].split(",")]
    return emails

Create_account creates an account for a new user in the specified user pool.

def create_account(pool_id, email, cognito_client):
    response = cognito_client.admin_create_user(
                    UserPoolId = pool_id,
                    Username=email,
                    UserAttributes=[
                        {
                            'Name': 'email',
                            'Value': email,
                        },
                    ],
                    ForceAliasCreation=True,
                    DesiredDeliveryMediums=[
                        'EMAIL',
                    ],
                )
    return response

Format_email customizes the welcome email template with the pool name. Can be used to add custom sentences to the email based on other aspects of the config file.

def format_email(pool_name):
    with open("welcome-template.txt", "r") as f:
        html = f.read()
        html = html.replace("<pool_name>", pool_name)
    return html

Update_user_pool configures the user pool settings, including: ensuring MFA is on, customized email FROM address (based on the ARN) and FROM string, and updating the welcome email with the customized template.

def update_user_pool(pool_id, pool_name, html, identity_arn, cognito_client):
  response = cognito_client.update_user_pool(
        UserPoolId= pool_id,
        AutoVerifiedAttributes=['email'],

        MfaConfiguration='ON',
        DeviceConfiguration={
            'ChallengeRequiredOnNewDevice': True,
            'DeviceOnlyRememberedOnUserPrompt': True
        },
        EmailConfiguration={
            'SourceArn': identity_arn,
            'EmailSendingAccount': 'DEVELOPER',
            'From': 'openpath@nrel.gov'
        },
        AdminCreateUserConfig={
            'AllowAdminCreateUserOnly': True,
            'InviteMessageTemplate': {
                'EmailMessage': str(html),
                'EmailSubject': f'Welcome to {pool_name} user pool!'
            }
        },
)

Running the script

 # Starts by checking for the User Pool. If the User Pool does not yet exist, wait until it is set up to add users. 
UserPoolExist, pool_id = get_userpool_name(pool_name, cognito_client) 

#If the user pool exists,  extract email addresses from conf file. 
if UserPoolExist:
    emails = email_extract()
    #Loop over each email address. Check if they're in the user pool.
    for email in emails:
        if not user_already_exists(pool_id, email, cognito_client):   
            #If user not in pool, format the email template for their welcome email, update the user pool, and create an account for them.
            print(email + " not in user pool! Creating account...")
            html = format_email(pool_name)
            identity_arn = get_verified_arn(sts_client)
            update_user_pool(pool_id, pool_name, html, identity_arn, cognito_client)
            response = create_account(pool_id, email, cognito_client)
#Checks to make sure that the create_account function went through (ie 200 response)
            if response['ResponseMetadata']['HTTPStatusCode'] == 200:
                print("Account created! Sending welcome email.")
            else:
                print("Account creation unsuccessful.")
                print(response['ResponseMetadata']['HTTPStatusCode'])       
#If the user is already in the pool, they will not be added again. 
        else:
            print(email + " already in user pool!")
#If the user pool does not exist yet, whoever is running the script should address this!
else:
    print(pool_name + " does not exist! Try again later.")

There is a lot of room for further customization for next-level goals. I will meet with Shankari next week to discuss! ๐Ÿ˜„

nataliejschultz commented 7 months ago

After making some initial changes to the script, here is an example of how it works:

The Wyoming user pool starts with zero users. I modified the wyoming.nrel-op.json config file to have my email in the intro section, in a key I called admin_users:

"admin_users": "email1@nrel.gov",

In admin_dashboard,

"map_trip_lines": false, "data_trips_columns_exclude": ["data.start_loc.coordinates", "data.end_loc.coordinates"],

In the command line, I input the following command:

python email-config.py wyoming.nrel-op.json

This command will hopefully be automated to run with GitHub actions upon merging a new config, assuming I can extract the name of the newly merged file when it's added.

In terminal, the following prints are displayed:

users: []
email1@nrel.gov not in user pool! Creating account...
Account created! Sending welcome email.

I got the following email seconds later:

Inbox appearance Email content with info redacted
Screenshot 2023-11-28 at 12 33 42 PM Screenshot 2023-11-28 at 12 40 02 PM

Now, my NREL email is in the user pool! Let's see what happens when we try to re-add this email and a different email, while simultaneously changing the config:

"admin_users": "email1@nrel.gov, email2@yahoo.com",
"map_trip_lines": true,
"data_trips_columns_exclude": []

The output in terminal after running the exact same command as before:

email1@nrel.gov already in user pool!
email2@yahoo.com not in user pool! Creating account...
Account created! Sending welcome email.
The appearance of the email is almost identical, with the exception of this section: Email content Sentence change
image Added "Additionally, you can view individual user origin destination points" because map_trip_lines set to true. Removed "Your configuration excludes trip start/end in the trip table. Let us know if you would like to include those." due to data_trips_columns_exclude being empty

I think the email can be modified to look better/more official based on @shankari's preferences, though it's not fully necessary. I'm going to remove the link to the trajectory table issue; I'm not sure if the sentence about the table is even necessary. I am also not sure what additional options to include for the data_trips_columns_exclude, or for any of the other admin dashboard preferences.ย 

shankari commented 7 months ago

I am also not sure what additional options to include for the data_trips_columns_exclude, or for any of the other admin dashboard preferences.

data_trips_columns_exclude indicates the columns that we should exclude from the trip table. The column names are pretty self-explanatory and are typically used to remove spatial data (e.g. https://github.com/e-mission/nrel-openpath-deploy-configs/blob/9795b34cac56d418794343c0656018caea6c7aef/configs/denver-casr.nrel-op.json#L128)

shankari commented 7 months ago

I am creating admin credentials for the USAID project, so I tried to run the script manually after copying the credentials and changing the config file to include

--- a/configs/usaid-laos-ev.nrel-op.json
+++ b/configs/usaid-laos-ev.nrel-op.json
@@ -10,6 +10,7 @@
         "start_month": "05",
         "start_year": "2023",
         "program_admin_contact": "เบ—เปˆเบฒเบ™ เป‚เบเบชเบปเบ™ เบœเปˆเบฒเบ™เบ—เบฒเบ‡เบญเบตเป€เบกเบง Kosol.Kiatreungwattana@nrel.gov เปเบฅเบฐ เป€เบšเบตเบงเบญเบ”เปเบญเบš 303-517-7674",
+        "admin_users": "K.Shankari@nrel.gov, ...",
         "deployment_partner_name": "USAID - Laos",
         "translated_text": {
             "lo": {

I got

email_automation kshankar$ python3 email-config.py ../configs/usaid-laos-ev.nrel-op.json
Traceback (most recent call last):
  File "/Users/kshankar/Desktop/data/e-mission/openpath-deploy-configs/email_automation/email-config.py", line 141, in <module>
    is_userpool_exist, pool_id = get_userpool_name(pool_name, cognito_client)
  File "/Users/kshankar/Desktop/data/e-mission/openpath-deploy-configs/email_automation/email-config.py", line 48, in get_userpool_name
    pool_id = response["UserPools"][user_pool_index]["Id"]
TypeError: list indices must be integers or slices, not NoneType
nataliejschultz commented 7 months ago

I am creating admin credentials for the USAID project, so I tried to run the script manually after copying the credentials. I got


email_automation kshankar$ python3 email-config.py ../configs/usaid-laos-ev.nrel-op.json

No need to put the path to the config, just the name of it!

shankari commented 7 months ago

I figured that out from the code! I added in a line to print this out for easier debugging in the future. And it did work! I got the welcome email, I think so did @Abby-Wheelis

I am going to try to login with the temporary password now...

Abby-Wheelis commented 7 months ago

I got the welcome email, I think so did @Abby-Wheelis

I did get the welcome email, but the link appears to be broken for me, I'm not sure if it would affect anything, but I have tried it both on and off of the VPN

shankari commented 7 months ago

Ah yes, the link is broken

@nataliejschultz we got a link to https://nrelopenpath-prod-usaid-laos-ev-openpath.nrel.gov/admin/

It should be https://usaid-laos-ev-openpath.nrel.gov/admin/

@Abby-Wheelis can you try that?

nataliejschultz commented 7 months ago

I did get the welcome email, but the link appears to be broken for me, I'm not sure if it would affect anything, but I have tried it both on and off of the VPN

@shankari did you have the same experience? I haven't been able to try the logging in step, since I didn't have a dashboard to actually log into! The link to the dashboard is generated based on the pattern of naming the config..

nataliejschultz commented 7 months ago

Ah yes, the link is broken https://nrelopenpath-prod-usaid-laos-ev-openpath.nrel.gov/admin/

@nataliejschultz we got a link to https://usaid-laos-ev-openpath.nrel.gov/admin/

Will this pattern follow for naming with the admin dashboard link? I will change it if so.

nataliejschultz commented 7 months ago

The first level goal PR is ready for merging!

I submitted a ticket on ServiceNow outlining exactly what we need for authentication with AWS through GitHub actions. The task was assigned to Jianli, who I have been collaborating with for all AWS permissions issues.

In the meantime, I've been working on how to get the name of a newly pushed config file into our workflow file. I've found a dependency that can work for both push and pull use cases. There is a second dependency that might be simpler, but it's hard to know until I test them out in my fork.

It will be up to @shankari if we use either of them(she might know a better way to do this without a dependency), but I will play around with them and see if they'll work for our use case regardless.

shankari commented 7 months ago

Yup, results of playing around will help decide between them!

shankari commented 6 months ago

I would like you to support both GHA and local. It sounds like the main difference is in authenticating to AWS. So you would have auth_for_gh_actions and auth_for_local_run and have the rest of the code be the same.

You would invoke auth_for_gh_actions or auth_for_local_run either by seeing which environmental variables were present, or by using argparse to have people pass in -g for gha and -l for local or something.

shankari commented 6 months ago

the argument would be the full path to the filename. And standard UNIX rules will apply to "full path to file name", so it will use relative paths or absolute paths depending on whether the pathname starts with / or not.

nataliejschultz commented 6 months ago

I would like you to support both GHA and local. It sounds like the main difference is in authenticating to AWS. So you would have auth_for_gh_actions and auth_for_local_run and have the rest of the code be the same.

Added an argparser section to main function and modified so that the full path to the config file can be input:

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument('-l', '--local',
                       help = 'Running locally. Provide full path to config file + install boto3 prior to running.' )
    group.add_argument('-g', '--github',
                       help = 'Must be run on GitHub. To run locally, use -l argument.') 
    args = parser.parse_args()
    filepath_raw = sys.argv[2]
    filename_raw = filepath_raw.split("/")[-1]
    filename = filename_raw.split('.')[0]
    pool_name = "nrelopenpath-prod-" + filename
    current_path = os.path.dirname(__file__)
    config_path = os.path.relpath('../configs/'+ filename_raw, current_path)

Also updated the readme to reflect these changes:

#Run the email-config.py script, and pass the path to the config file in as an argument:

`python email-config.py -l /path/to/configfile.nrel-op.json`

The URL issue was fixed as well in the last push. Moved the PR to ready for review.

nataliejschultz commented 5 months ago

Added functionality to remove users when they're in the User Pool and aren't in the config file for that project. See this comment for in-depth description.

nataliejschultz commented 4 months ago

Added functionality to remove users when they're in the User Pool and aren't in the config file for that project. See this comment for in-depth description.

Opened a PR For this change!