Open shankari opened 8 months ago
This code should live in the nrel-openpath-deploy-configs
repo since we will eventually want to automate this when the config changes by using GitHub Actions ๐
Now that the nominatim project is mostly squared away, I've been gaining some background on how to get this email project started.
We'll have to create an identity for openpath@nrel.gov on the AWS account. I tried to create this identity, and immediately ran into permissions issues:
You do not have sufficient access to perform this action.
User: ...nschultz@nrel.gov is not authorized to perform: ses:TagResource on resource: ...identity/openpath@nrel.gov because no identity-based policy allows the ses:TagResource action
I replaced numbers with ### because I'm not yet familiar enough to know which numbers could represent sensitive information, but you get the gist. I'm going to reach out to Jianli and see if there's an easy way to add these permissions.
I've gotten boto3 working with the staging account, and was able to add my NREL email as a verified email. I then tried to send myself an email using the outline provided here.
However, when I sent the email, I got an undeliverable DMARC authentication error. I can't see the verified emails in the console, but I think I have to change the DMARC policy somehow so that it sees @nrel.gov emails as matching the domain. I haven't looked into it thoroughly but I'm going to learn more tomorrow.
Jianli helped me figure out the DMARC issues. I successfully sent a test email from openpath@nrel.gov using amazon sdk (boto3) to an account that is not a verified identity, and there were no domain verification issues.
I've also put together a small script to extract the email addresses from the config files, using a bit of RegEx code that I found:
with open ('localpath/file.nrel-op.json') as config_file:
data = json.load(config_file)
intro = data['intro']
info = intro['program_admin_contact']
match = re.search(r'[\w\.-]+@[\w\.-]+\.\w+', info)
email = match.group(0)
print(email)
I've tested it on a few of the config files and it works really well, as long as the user puts in the email correctly.
I've also put together a small script to extract the email addresses from the config files, using a bit of RegEx code that I found:
the admin contact is typically one person who is the "public face" of the project. But the list of admin users is typically much longer, especially if they have interns or grad students to help manage the project over time.
So the appendix to the MOU allows people to list 3 admin users.
It should read the admin_users or similar tag from the config file JSON
So you should add a new tag that lists all of the users and read that instead of the program_admin_contact
And you don't need to do any fancy parsing for that because we can just expect a list of email addresses.
To test, you would need to create a new JSON file with the new field manually added and populated with test email addresses.
To clarify further, we may want to customize the email that is sent depending on other aspects from the config file.
So the config has an admin-dashboard
section with some screens enabled and disabled, and some columns excluded sometimes.
So for research partners, for example, we will show them the full data in the trip and trajectory tables (e.g. uue
)
For public agency partners, we will have trips linked to users in the trip table but without locations, and in the trajectory table, we will show the locations, but not the user id to ensure anonymity (e.g. ride2own
)
so in the email that I forwarded to you, we highlight that we don't show spatial information in the trip table... That text should be removed when sending to partners where the admin-dashboard does not exclude the spatial tables
There are multiple config files. The config file name or the serverURL field in the config should tell you which config it is. You should then use that config name to map to the user pool
e.g. https://github.com/e-mission/nrel-openpath-deploy-configs/blob/main/configs/ebikethere-garfield-county.nrel-op.json has a server URL of ebikethere-garfield-county-openpath.nrel.gov, and maps to a user pool of nrelopenpath-prod-ebikethere-garfield-county
You can check the other config files as well and their mappings to the other user pools
I've been working on my script and have a few updates + questions. Here is the state of the script:
Extract email addresses
def email_extract():
with open ('path/to.nrel-op.json') as config_file:
data = json.load(config_file)
intro = data['intro']
emails = [i.strip() for i in intro['admin_users'].split(",")]
return emails
Get name of user pool and check if it exists
filename = "name-of-file"
pool_test = "nrelopenpath-prod-" + filename
pool_id = []
AWS_REGION = "us-west-2"
client = boto3.client(
'cognito-idp',
aws_access_key_id="",
aws_secret_access_key= "",
aws_session_token="",
region_name=AWS_REGION
)
response = client.list_user_pools(MaxResults=60)
for i in response["UserPools"]:
if i["Name"] == pool_test:
print(pool_test + " pool exist!")
pool_id.append(i["Id"])
break
else:
print("Pool DNE! Looking...")
continue
Check for user emails in user pool
def user_already_exists(pool_id, email):
try:
response = client.list_users(UserPoolId=pool_id)
users = response["Users"]
result = False
for i in users:
for k, v in i.items():
if k == "Attributes":
for j in v:
if j["Name"] == "email":
user_email = str(j["Value"])
if str(email) == user_email:
result = True
return result
except ClientError as err:
logger.error(
"Couldn't list users for %s. Here's why: %s: %s",
pool_id,
err.response["Error"]["Code"],
err.response["Error"]["Message"],
)
raise
for email in email_extract():
if not user_already_exists(pool_id[0], email):
print(email + " not in user pool!")
The next part is adding the prompting for signup to the SES portion of the script. I think it is getting close to done.
Questions How will the pulling of the AWS credentials work in production? Is there a reference I can use?
How to pull the name of the file after it's merged? I believe this will be based on @Abby-Wheelis 's GitHub actions workflow/where the config file ends up.
How will the pulling of the AWS credentials work in production? Is there a reference I can use? How to pull the name of the file after it's merged? I believe this will be based on @Abby-Wheelis 's GitHub actions workflow/where the config file ends up.
As you can see from https://github.com/e-mission/e-mission-docs/issues/1008#issue-1935857136
The first-level goal is to have a script to automate this. The script should take a config file as input.
You don't have a dependency on @Abby-Wheelis's GitHub action. Integrating the two is the next step.
Here is the state of the script:
This would be perfect to put into a draft PR so I can comment on it
I ran my script using my nrel email to sign myself up for the staging pool, and have found out a few different things:
However, we can much more easily automate sending an email to users using just SES separately from cognito. I think the most straightforward route is to send two emails: One from AWS with the username and temp password, and another highly customizable email with the URL to the admin dashboard, custom info based on the config file, and a notice that their temporary password will be in a separate email. It is up to @shankari what to do, since the cloud team wants to set up the user pools with CDK.
The AWS user pool must be configured to send email with SES if it is to come from a custom FROM address. This is a setting that can be changed on creation of the pool.
Correct. This is why we set up SES in the first place.
This is a setting that can be changed on creation of the pool.
Are you sure it is not changeable after the pool is created? Per https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-pool-updating.html you can change almost all the existing settings using UpdateUserPool
The invitation message template is somewhat customizable, though to truly customize the email (ie integrate it with the script and include custom information), we would need to use AWS Lambda.
I don't understand this. What custom information do we need? From the UI, I can create an email message template and have it be used when sending out the email. I assume that is the "somewhat customizable". I would start with that and document what (if anything) cannot be done.
I think the most straightforward route is to send two emails:
We may have to fall back to this, but it is clearly a sub-optimal solution. Any solution that requires the user to open two separate emails instead of one adds one more failure point. Can you please clarify why the default "customizable" option does not work, with supporting links and examples along with outputs?
you can change almost all the existing settings using
UpdateUserPool
You are right!!! This was the missing puzzle piece that I had searched for but could not find. I'm going to play around with UpdateUserPool
and see what its limitations are ๐
Okay, I've made a LOT of changes to the script since finding out about UpdateUserPool
. Currently working on organizing all of the functions properly + calling them in the right order so that all variables can get passed in properly. Once that's done, I can push a commit to my pull request and move to ready for review!
I put the script into ready for review on thursday around midnight (a few hours short of my goal ๐). Here is an in-depth overview of the functionality of the script:
#The only requirement to run is putting the name of the config file in as an argument to command line when running.
#The name of the file is used to create the name of the pool, which is used to find the pool ID.
filename_raw = sys.argv[1]
filename = filename_raw.split(".")[0]
pool_name = "nrelopenpath-prod-" + filename
#Gets a relative path to the config file, so that it can be opened and read
current_path = os.path.dirname(__file__)
config_path = os.path.relpath('../configs/'+ filename_raw, current_path)
#Set up AWS credentials as environment variables + set variables
ACCESS = os.environ.get("AWS_ACCESS_KEY_ID")
SECRET = os.environ.get("AWS_SECRET_ACCESS_KEY")
TOKEN = os.environ.get("AWS_SESSION_TOKEN")
AWS_REGION = "us-west-2"
#Set up clients
cognito_client = boto3.client(
'cognito-idp',
aws_access_key_id = ACCESS,
aws_secret_access_key= SECRET,
aws_session_token=TOKEN,
region_name=AWS_REGION
)
sts_client = boto3.client("sts")
#This function has two options, based on @shankari 's preferences for how to run it.
def get_userpool_name(pool_name, cognito_client):
response = cognito_client.list_user_pools(MaxResults=60)
UserPoolExist = False
#########One option to set the user pool without breaking (but still stop when condition met)
i = 0
while response["UserPools"][i]["Name"] != pool_name and i < len(response["UserPools"]) - 1:
print("looking for user pool...")
i = i + 1
if response["UserPools"][i]["Name"] == pool_name:
UserPoolExist = True
pool_id = response["UserPools"][i]["Id"]
print(pool_name + " pool exists! Checking for users...")
#########Second option that uses a break when condition is met:
# for i in response["UserPools"]:
# if i["Name"] == pool_name and not UserPoolExist:
# pool_id = i["Id"]
# UserPoolExist = True
# print(pool_name + " pool exists! Checking for users...")
# break
# else:
# print("Looking for pool...")
# continue
return UserPoolExist, pool_id
def user_already_exists(pool_id, email, cognito_client):
try:
response = cognito_client.list_users(UserPoolId=pool_id)
users = response["Users"]
result = False
for i in users:
for k, v in i.items():
if k == "Attributes":
for j in v:
if j["Name"] == "email":
user_email = str(j["Value"])
if str(email) == user_email:
result = True
return result
except ClientError as err:
logger.error(
"Couldn't list users for %s. Here's why: %s: %s",
pool_id,
err.response["Error"]["Code"],
err.response["Error"]["Message"],
)
raise
def get_verified_arn(sts_client):
account_num = sts_client.get_caller_identity()["Account"]
identity_arn = "arn:aws:ses:" + AWS_REGION + ":" + account_num + ":identity/openpath@nrel.gov"
return identity_arn
def email_extract():
with open (config_path) as config_file:
data = json.load(config_file)
intro = data['intro']
emails = [i.strip() for i in intro['admin_users'].split(",")]
return emails
def create_account(pool_id, email, cognito_client):
response = cognito_client.admin_create_user(
UserPoolId = pool_id,
Username=email,
UserAttributes=[
{
'Name': 'email',
'Value': email,
},
],
ForceAliasCreation=True,
DesiredDeliveryMediums=[
'EMAIL',
],
)
return response
def format_email(pool_name):
with open("welcome-template.txt", "r") as f:
html = f.read()
html = html.replace("<pool_name>", pool_name)
return html
def update_user_pool(pool_id, pool_name, html, identity_arn, cognito_client):
response = cognito_client.update_user_pool(
UserPoolId= pool_id,
AutoVerifiedAttributes=['email'],
MfaConfiguration='ON',
DeviceConfiguration={
'ChallengeRequiredOnNewDevice': True,
'DeviceOnlyRememberedOnUserPrompt': True
},
EmailConfiguration={
'SourceArn': identity_arn,
'EmailSendingAccount': 'DEVELOPER',
'From': 'openpath@nrel.gov'
},
AdminCreateUserConfig={
'AllowAdminCreateUserOnly': True,
'InviteMessageTemplate': {
'EmailMessage': str(html),
'EmailSubject': f'Welcome to {pool_name} user pool!'
}
},
)
# Starts by checking for the User Pool. If the User Pool does not yet exist, wait until it is set up to add users.
UserPoolExist, pool_id = get_userpool_name(pool_name, cognito_client)
#If the user pool exists, extract email addresses from conf file.
if UserPoolExist:
emails = email_extract()
#Loop over each email address. Check if they're in the user pool.
for email in emails:
if not user_already_exists(pool_id, email, cognito_client):
#If user not in pool, format the email template for their welcome email, update the user pool, and create an account for them.
print(email + " not in user pool! Creating account...")
html = format_email(pool_name)
identity_arn = get_verified_arn(sts_client)
update_user_pool(pool_id, pool_name, html, identity_arn, cognito_client)
response = create_account(pool_id, email, cognito_client)
#Checks to make sure that the create_account function went through (ie 200 response)
if response['ResponseMetadata']['HTTPStatusCode'] == 200:
print("Account created! Sending welcome email.")
else:
print("Account creation unsuccessful.")
print(response['ResponseMetadata']['HTTPStatusCode'])
#If the user is already in the pool, they will not be added again.
else:
print(email + " already in user pool!")
#If the user pool does not exist yet, whoever is running the script should address this!
else:
print(pool_name + " does not exist! Try again later.")
There is a lot of room for further customization for next-level goals. I will meet with Shankari next week to discuss! ๐
After making some initial changes to the script, here is an example of how it works:
The Wyoming user pool starts with zero users. I modified the wyoming.nrel-op.json config file to have my email in the intro section, in a key I called admin_users:
"admin_users": "email1@nrel.gov",
In admin_dashboard,
"map_trip_lines": false,
"data_trips_columns_exclude": ["data.start_loc.coordinates", "data.end_loc.coordinates"],
In the command line, I input the following command:
python email-config.py wyoming.nrel-op.json
This command will hopefully be automated to run with GitHub actions upon merging a new config, assuming I can extract the name of the newly merged file when it's added.
In terminal, the following prints are displayed:
users: []
email1@nrel.gov not in user pool! Creating account...
Account created! Sending welcome email.
I got the following email seconds later:
Inbox appearance | Email content with info redacted |
---|---|
Now, my NREL email is in the user pool! Let's see what happens when we try to re-add this email and a different email, while simultaneously changing the config:
"admin_users": "email1@nrel.gov, email2@yahoo.com",
"map_trip_lines": true,
"data_trips_columns_exclude": []
The output in terminal after running the exact same command as before:
email1@nrel.gov already in user pool!
email2@yahoo.com not in user pool! Creating account...
Account created! Sending welcome email.
The appearance of the email is almost identical, with the exception of this section: | Email content | Sentence change |
---|---|---|
Added "Additionally, you can view individual user origin destination points" because map_trip_lines set to true. Removed "Your configuration excludes trip start/end in the trip table. Let us know if you would like to include those." due to data_trips_columns_exclude being empty |
I think the email can be modified to look better/more official based on @shankari's preferences, though it's not fully necessary. I'm going to remove the link to the trajectory table issue; I'm not sure if the sentence about the table is even necessary. I am also not sure what additional options to include for the data_trips_columns_exclude, or for any of the other admin dashboard preferences.ย
I am also not sure what additional options to include for the data_trips_columns_exclude, or for any of the other admin dashboard preferences.
data_trips_columns_exclude
indicates the columns that we should exclude from the trip table. The column names are pretty self-explanatory and are typically used to remove spatial data (e.g. https://github.com/e-mission/nrel-openpath-deploy-configs/blob/9795b34cac56d418794343c0656018caea6c7aef/configs/denver-casr.nrel-op.json#L128)
I am creating admin credentials for the USAID project, so I tried to run the script manually after copying the credentials and changing the config file to include
--- a/configs/usaid-laos-ev.nrel-op.json
+++ b/configs/usaid-laos-ev.nrel-op.json
@@ -10,6 +10,7 @@
"start_month": "05",
"start_year": "2023",
"program_admin_contact": "เบเปเบฒเบ เปเบเบชเบปเบ เบเปเบฒเบเบเบฒเบเบญเบตเปเบกเบง Kosol.Kiatreungwattana@nrel.gov เปเบฅเบฐ เปเบเบตเบงเบญเบเปเบญเบ 303-517-7674",
+ "admin_users": "K.Shankari@nrel.gov, ...",
"deployment_partner_name": "USAID - Laos",
"translated_text": {
"lo": {
I got
email_automation kshankar$ python3 email-config.py ../configs/usaid-laos-ev.nrel-op.json
Traceback (most recent call last):
File "/Users/kshankar/Desktop/data/e-mission/openpath-deploy-configs/email_automation/email-config.py", line 141, in <module>
is_userpool_exist, pool_id = get_userpool_name(pool_name, cognito_client)
File "/Users/kshankar/Desktop/data/e-mission/openpath-deploy-configs/email_automation/email-config.py", line 48, in get_userpool_name
pool_id = response["UserPools"][user_pool_index]["Id"]
TypeError: list indices must be integers or slices, not NoneType
I am creating admin credentials for the USAID project, so I tried to run the script manually after copying the credentials. I got
email_automation kshankar$ python3 email-config.py ../configs/usaid-laos-ev.nrel-op.json
No need to put the path to the config, just the name of it!
I figured that out from the code! I added in a line to print this out for easier debugging in the future. And it did work! I got the welcome email, I think so did @Abby-Wheelis
I am going to try to login with the temporary password now...
I got the welcome email, I think so did @Abby-Wheelis
I did get the welcome email, but the link appears to be broken for me, I'm not sure if it would affect anything, but I have tried it both on and off of the VPN
Ah yes, the link is broken
@nataliejschultz we got a link to https://nrelopenpath-prod-usaid-laos-ev-openpath.nrel.gov/admin/
It should be https://usaid-laos-ev-openpath.nrel.gov/admin/
@Abby-Wheelis can you try that?
I did get the welcome email, but the link appears to be broken for me, I'm not sure if it would affect anything, but I have tried it both on and off of the VPN
@shankari did you have the same experience? I haven't been able to try the logging in step, since I didn't have a dashboard to actually log into! The link to the dashboard is generated based on the pattern of naming the config..
Ah yes, the link is broken https://nrelopenpath-prod-usaid-laos-ev-openpath.nrel.gov/admin/
@nataliejschultz we got a link to https://usaid-laos-ev-openpath.nrel.gov/admin/
Will this pattern follow for naming with the admin dashboard link? I will change it if so.
The first level goal PR is ready for merging!
I submitted a ticket on ServiceNow outlining exactly what we need for authentication with AWS through GitHub actions. The task was assigned to Jianli, who I have been collaborating with for all AWS permissions issues.
In the meantime, I've been working on how to get the name of a newly pushed config file into our workflow file. I've found a dependency that can work for both push and pull use cases. There is a second dependency that might be simpler, but it's hard to know until I test them out in my fork.
It will be up to @shankari if we use either of them(she might know a better way to do this without a dependency), but I will play around with them and see if they'll work for our use case regardless.
Yup, results of playing around will help decide between them!
I would like you to support both GHA and local. It sounds like the main difference is in authenticating to AWS. So you would have auth_for_gh_actions and auth_for_local_run and have the rest of the code be the same.
You would invoke auth_for_gh_actions or auth_for_local_run either by seeing which environmental variables were present, or by using argparse to have people pass in -g for gha and -l for local or something.
the argument would be the full path to the filename. And standard UNIX rules will apply to "full path to file name", so it will use relative paths or absolute paths depending on whether the pathname starts with /
or not.
I would like you to support both GHA and local. It sounds like the main difference is in authenticating to AWS. So you would have auth_for_gh_actions and auth_for_local_run and have the rest of the code be the same.
Added an argparser section to main function and modified so that the full path to the config file can be input:
if __name__ == "__main__":
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument('-l', '--local',
help = 'Running locally. Provide full path to config file + install boto3 prior to running.' )
group.add_argument('-g', '--github',
help = 'Must be run on GitHub. To run locally, use -l argument.')
args = parser.parse_args()
filepath_raw = sys.argv[2]
filename_raw = filepath_raw.split("/")[-1]
filename = filename_raw.split('.')[0]
pool_name = "nrelopenpath-prod-" + filename
current_path = os.path.dirname(__file__)
config_path = os.path.relpath('../configs/'+ filename_raw, current_path)
Also updated the readme to reflect these changes:
#Run the email-config.py script, and pass the path to the config file in as an argument:
`python email-config.py -l /path/to/configfile.nrel-op.json`
The URL issue was fixed as well in the last push. Moved the PR to ready for review.
Added functionality to remove users when they're in the User Pool and aren't in the config file for that project. See this comment for in-depth description.
Added functionality to remove users when they're in the User Pool and aren't in the config file for that project. See this comment for in-depth description.
Opened a PR For this change!
Context: The NREL hosted admin dashboard uses AWS cognito for authentication Right now, when we configure a new dynamic config (https://github.com/e-mission/nrel-openpath-deploy-configs) for a new program, @shankari manually goes in and adds the logins, creates temporary passwords, and then sends out an email with instructions from her personal email.
The first-level goal is to have a script to automate this. The script should take a config file as input. It should read the
admin_users
or similar tag from the config file JSON It should "sync" the users in the file with the users in cognito:For the users that are added, we should have cognito autogenerate a temporary password and send it to the users' email along with standard instructions (use 2FA....). The email should come from openpath@nrel.gov instead of verificationemail.com.