apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62k stars 13.59k forks source link

AWS IAM Role support for send mail functionality #23486

Closed rahuludaiwal599 closed 7 months ago

rahuludaiwal599 commented 1 year ago

Guys, currently SMTP is being only a supported system to send mails. Can you add iam role support here for AWS?

mdeshmu commented 1 year ago

Email delivery support is cloud agnostic. SES SMTP works perfectly fine and it requires smtp credentials which are not same as iam role.

Why do you need support for iam role? Do you wanna uses SES API ?

rahuludaiwal599 commented 1 year ago

Actually current method uses Access key and secret id as SMTP credentials for AWS SES service, which being static credentials imposes a security risk in our infra. So we want to move to IAM roles where credentials are generated temporarily.

mdeshmu commented 1 year ago
  1. I am assuming you are not keeping smtp credentials hardcoded in superset config or superset's IaC (CDK/Terraform etc whatever you are using). If you are then you should change both to take credentials from secret manager during deployment. Also you should use ses smtp vpc endpoints to communicate over private network if have doubts of in transit credentials theft.

  2. Your SMTP password is different from your AWS secret access key. Hence SMTP credentials are not same as IAM credentials. https://docs.aws.amazon.com/ses/latest/dg/send-email-concepts-credentials.html

  3. You have to delete and recreate a new smtp iam user every time you want to change smtp credentials. https://repost.aws/knowledge-center/ses-rotate-smtp-access-keys

You can check with aws how this can be automated.

  1. The SES SMTP interface doesn't support SMTP credentials that have been generated from temporary security credentials. Please refer red marked important note here: https://docs.aws.amazon.com/ses/latest/dg/smtp-credentials.html

Hence I don't think adding support for iam in superset will help for your purpose. Unless you don't want to use SMTP interface and instead want to use SES API, in that case it will be big feature change.

nytai commented 1 year ago

the current SMTP code works without specifying credentials. Not sure if AWS SES will pick up on the IAM role if credentials are not present, but you could always try. IMO, the superset codebase should not contain any code specific to cloud providers and their apis. You can always fork the codebase if this is something you really need.

https://github.com/apache/superset/blob/0fa421271e874e456ae71f7aca4c71130176332d/superset/utils/core.py#L1030

mdeshmu commented 1 year ago

@rahuludaiwal599 please close the ticket if we have answered your queries. Else we should move it to discussion since this is not a bug as such.

rahuludaiwal599 commented 1 year ago

Please allow me till wednesday EOD, to respond here.

Adminyoungmarsnet commented 1 year ago

Email delivery support is cloud agnostic. SES SMTP works perfectly fine and it requires smtp credentials which are not same as iam role.

Why do you need support for iam role? Do you wanna uses SES API ?

THX

rahuludaiwal599 commented 1 year ago
  1. I tried using not passing the smtp password but it didn't automatically assumed iam role.
  2. I don't think it should be a big feature change here as in your current method you are already creating the email content and storing it in a variable next thing you require to do is: a. Create a ses client using > ses_client = boto3.client("ses") b. Now you can use any of the following method to send that already framed content: i. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ses/client/send_email.html ii. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ses/client/send_raw_email.html iii. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ses/client/send_templated_email.html
  3. Method you are suggesting is rotation of credentials manually which assuming iam role in aws is already taken care off by AWS libraries only, it intelligently manages to rotate credentials(you just need to create a client using step:2a) on their end only to keep the session intact.
  4. We can use iam role to set session duration of 1hr from console and aws libraries are designed in such a way that we just need to create a client and refreshment of that client is been taken care of by aws itself while if we take this responsibility on us then it would be very tedious as: i. First we need to write an automation of rotating access keys every one hour which will include a cost parameter for running that automation and ii. Secondly in our code also at every point we need to check if the client is 1hr old then refreshing that client again and again which will include latency factor in our code too.
  5. Also your point is valid that these are smtp credentials and not access keys but these smtp credentials can be used to send mails on behalf of us to our customers in view of trying a phishing attempt on them redirecting them on my custom link and can do a PII theft on them. Or can ask them to make any payments creating a financial loss on them.

Let me know your thoughts on it. I am expecting this shouldn't take much of your time as email content is already present with us just a ses client and send mail function we need to add.

mdeshmu commented 1 year ago

So basically you want SES API-driven email functionality as all of the documentation you shared is for SES API. I agree with @nytai that the superset codebase should not contain any code specific to cloud providers and their APIs. You can always fork the codebase if this is something you need. Nevertheless, this is not an issue as such and should be moved to a discussion at least. cc: @rusackas

rahuludaiwal599 commented 1 year ago

Can i do the change and raise the PR would that be fine here?

rahuludaiwal599 commented 1 year ago

In terms of functionality it’s not an issue but in terms of security it’s an issue and a very basic functionality which almost every tool provides. I can provide you usecases where big orgs faced data theft while using static credentials in their code. I would request if you can allow me to raise a PR for this issue would be helpful.

mdeshmu commented 1 year ago
  1. Keep celery worker always in private subnet.

  2. Store smtp credentials in aws secrets manager and load from there directly to celery worker container environment during deployment time. This way credentials are not stored in superset config files or infrastructure as a code e.g. cdk/terraform files. Hence are not exposed anywhere.

  3. Use SES VPC endpoints so that traffic from your private subnet goes to SES over private network only: https://docs.aws.amazon.com/ses/latest/dg/send-email-set-up-vpc-endpoints.html Additionally use Superset's SSL and STARTTLS parameters for SMTP for encrypted communication. So no concern of credentials theft

  4. Data transfer from SES to your mail server can also be made private with additional aws services.

ninjadig commented 1 year ago

@mdeshmu using credentials is always expose risk and whether it is stored even in secrets managers too, due to misconfiguration and this has been shown in past breaches because of that role based access are always more secure and less risky path and we didn't find anything related to role based with apache superset, we can fork the repo and raise the PR for the same it will allow apache superset more reliable in terms of security and will provide an option to user to choose more security configuration.

hope you understoon my point.

mdeshmu commented 1 year ago

I don't think you have understood my points. I fail to understand how credentials stored and transmitted in private network are at risk?

While everyone is allowed to raise PR to this project, I will let any Superset PMC member comment about this feature request.

IMO, It should be a separate option different from current Email option in alerts and report.

Nevertheless, This issue should be moved to a github discussion.

yash-sec commented 1 year ago

Agree to @rahuludaiwal599 and @Sectestlist, Guys we should take this as an issue only as this tool is using authentication method 5-6years old which got hacked too many a times till now. Following are the couple of instances of data breach happened through access keys leak. https://tomforb.es/infosys-leaked-fulladminaccess-aws-keys-on-pypi-for-over-a-year/ https://blog.gitguardian.com/thinking-like-a-hacker-aws-keys-in-private-repos/ Jira tickets are a very common examples through which a developer leaks access keys. Any sort of secure storation of keys your are doing dev can anyway fetch the keys using debugger in their IDE. And then boom!!!.... you can not expect a dev from even a very matured organization too that he won’t be getting hacked anyday. I think considering security as a priority for any organization iam role should only be the method which won’t affect the present flow but will just allow the users to be able to use more secure option. I think if security is something on which you can take a free call then moving this issue to discussion is fine but if security is your priority then this issue is 100% severe and also simple to implement too so, we should fix this issue on priority.

ninjadig commented 1 year ago

Since the beginning i am trying to explain along with @rahuludaiwal599 but it seems like @mdeshmu is not able to understand but @yash-sec this is good reference which you shared and i hope it will help more to understand the criticality and how we can fix it.

mdeshmu commented 1 year ago

In both the examples given above by @yash-sec, credentials were stored in the repository. That's a complete negligence and non-adherence to security best practices.

The solution I am talking about doesn't require you to store the credentials in the repository.

They are retrieved from AWS Secrets Manager at the container start-up. Here is an example: https://docs.aws.amazon.com/cdk/api/v1/docs/@aws-cdk_aws-ecs.Secret.html

You don't have to keep credentials in your code; So no question of being hacked. I don't know how else I can explain.

The point I am trying to make is that it ultimately it boils down to zero-trust architecture and establishing security best practices. Security is a shared responsibility, neither aws nor superset can help us if we are negligent.

atulkjaiswal commented 1 year ago

@mdeshmu Are you really talking about security best practices???

Man please refer the below link(Security Best Practices of IAM): https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html?secd_iam7#bp-workloads-use-roles

Thing to note down: Require workloads to use temporary credentials with IAM roles to access AWS

rahuludaiwal599 commented 1 year ago

@mistercrunch I would request you to please check this, your another product airflow provides iam role functionality we need same feature to be implemented here as well.

dim-ops commented 1 year ago

I agree user's access keys are not optimal, in the context of the APN we must give priority to the use of IAM roles, as recommended by AWS

piby180 commented 10 months ago

In our company as well, creating IAM users is disabled account wide via Service Control Policy by Infra team and there is nothing you can do to change their mind. Using third party services like SendGrid is also not allowed due to strict data sharing policies. We are currently using Slack for alerts. We want to use emails as well but we haven't figured it out how to do it.

Maybe create a way to override send_mime_email function via superset_config.py and let end users customize it as per their own host platforms? This way we are not adding AWS code to superset repository and still provide flexibility to end users. The best of both worlds

jgournet commented 7 months ago

Just pitching in as well:

The last suggestion from @piby180 :

Maybe create a way to override send_mime_email function via superset_config.py and let end users customize it as per their own host platforms ?

sounds the most reasonable to me. Actually, I would maybe suggest even more generic:

I guess this could be easy enough to implement on your side, and would solve that whole issue (and probably a few others as well).

rusackas commented 7 months ago

Moving this to a discussion, as was requested quite a while ago (sorry!), since this is definitely not a bug, but a feature request. If we were to support AWS or other proprietary solutions, the change in Superset should probably be generalized - either an overrideable bit of code, or an abstraction layer (plugin interface), where you can enable any number of proprietary integrations. This might all be SIP territory, depending which way it goes, but for now I'll move it to a Disucssion so we can thin the Issues backlog.