apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.19k stars 14.32k forks source link

Add Salesforce docs for setting up a connection #10029

Closed feluelle closed 4 years ago

feluelle commented 4 years ago

Description

As of the related issue on adding a custom connection (see issue link below) we need a how-to guide on setting up a connection to salesforce.

To be more specific we need to add a salesforce.rst to the docs/howto/connection directory explaining the fields you need to set to setup a proper connection to salesforce. A connection will be established through the airflow/providers/salesforce/hooks/salesforce.py hook. Check out this file to see how the connection is established and describe this in the guide you are adding. You can check other how-to connection guides in the directory to see how it works. If you have further questions do not hesitate to ask. :)

Use case / motivation

If you simply want to connect to salesforce services you should be able to do so without investigating the code and instead just read a proper howto docs explaining what to do.

Related Issues

https://github.com/apache/airflow/issues/8766

hugoio24 commented 4 years ago

Have you been able to write or find a documentation ? I'm trying to use this hook but it's kinda painful without example or docs...

feluelle commented 4 years ago

You can find a solution in the related issue I linked.

https://stackoverflow.com/questions/53510980/salesforce-connection-using-apache-airflow-ui

feluelle commented 4 years ago

@hugomalaise do you want to add the official docs for it (based on the stackoverflow solution)?

hugoio24 commented 4 years ago

Thanks for the stackoverflow article, it helps me a lot.

It would be convenient to have this information in the doc and also a DAG example in order to easily test the hook, have you developped a DAG that could be used as an example ?

eladkal commented 4 years ago

@feluelle I can create the doc and also example DAG but are you sure you want this doc in Airflow? This doc is acknowledgment that Airflow "asks" users to expose the security token.

feluelle commented 4 years ago

We can perhaps implement alternative auth mechanisms shown here that don't require additional sensitive data to be passed: https://github.com/simple-salesforce/simple-salesforce/blob/master/README.rst

To login using the JWT method, use your Salesforce username, consumer key from your app, and private key:

from simple_salesforce import Salesforce
sf = Salesforce(username='myemail@example.com', consumer_key='XYZ', privatekey_file='filename.key')

But at the moment this is the only working option we have. In my opinion we should add the docs but make sure we have a note that makes it clear that this is/can be a security issue.

What do others think? @potiuk @kaxil ?

hugoio24 commented 4 years ago

@feluelle : Have you been able to write a doc ? I can review it if you need.

kaxil commented 4 years ago

Absolutely, we could just add a Salesforce connection type here: https://github.com/apache/airflow/blob/a9f7222a3fdc9bb55c697bdad17f4e60e8d9e70f/airflow/models/connection.py#L38-L88

so instead of an empty Connection type (Which I don't is allowed from Airflow 2.0) we will have a dedicated Connection for it.,

kaxil commented 4 years ago

And the security_token won't be exposed, if you have fernet key, all the extras would be encrypted too: https://github.com/apache/airflow/blob/a9f7222a3fdc9bb55c697bdad17f4e60e8d9e70f/airflow/models/connection.py#L294-L302

cc @eladkal @feluelle

So I think creating this doc should be relatively straight-forward, want to take this issue on @hugomalaise ?

eladkal commented 4 years ago

@kaxil is it working? I'm running 1.10.9 we have fernet key set and there is no encryption to the extra field: Screen Shot 2020-08-11 at 17 42 58

passwords are encrypted. extra is not. Is it a bug or I'm doing something wrong?

Also It's important to note that encrypting the whole extra field isn't desired for Salesforce connection. There is another important param that you need to be able to control. the domain param determinate if you are running on sandbox or production. So the actual Extra can be: {"security_token":"your_token", "domain":"test" }

If the whole json is encrypted you never know on which mode you are and every time you want to change more you must also re-enter the security token. A workaround can be creating two connections one for test and one for production but again the question is if we want to write something like that in the docs.

kaxil commented 4 years ago

I will have to test it myself. But yeah, for now, the docs can simply state that for security reason we suggest you to use one of the secrets Backend to create this connection (Using ENVIRONMENT VARIABLE or Hashicorp Vault, GCP Secrets Manager etc)

eladkal commented 4 years ago

@kaxil I can prepare a PR to explain how to setup connection to Salesforce. It's relatively short explanation. Once we'll have more information or a dedicated connection for Salesforce I'll update the docs accordingly.

kaxil commented 4 years ago

@eladkal Assigned the Github issue to you :)