TIGRLab / air-tigrs

Orchestration tools for TIGRLab's data management infrastructure
1 stars 5 forks source link

[ENH] Add pull datasource --> XNAT workflow #3

Closed jerdra closed 3 years ago

jerdra commented 3 years ago

Some stuff that I put together for a first draft. I tested this on an archive/XNAT instance I spun up!

Some thoughts that I've had while putting this together:

  1. Should we have a slightly different config spec for setting up alternative datasources? I.e have a DataSources key which stores the necessary information to identify and connect to a datasource? This would be in lieu of having a separate set of Xnat keys and SFTP keys. They would be nested in DataSources instead

  2. An XNAT hook could be an interesting feature to explore for handling XNAT connections for uploads. Although I think this might need Pros/Cons brainstorming

  3. For this setup the SFTP connections are configured in the Airflow metadata DB. However by default these are unencrypted. You can view plaintext by just examining the ORM of the metadata DB. I think there are mechanisms to encrypt the data using Fernet but I havent looked into it yet!

I have a repo i'm putting together to spin up a development environment, i'll put that together shortly!

josephmje commented 3 years ago

For this setup the SFTP connections are configured in the Airflow metadata DB. However by default these are unencrypted. You can view plaintext by just examining the ORM of the metadata DB. I think there are mechanisms to encrypt the data using Fernet but I havent looked into it yet!

Not sure what's best here either. I've come across a few different options:

jerdra commented 3 years ago

For this setup the SFTP connections are configured in the Airflow metadata DB. However by default these are unencrypted. You can view plaintext by just examining the ORM of the metadata DB. I think there are mechanisms to encrypt the data using Fernet but I havent looked into it yet!

Not sure what's best here either. I've come across a few different options:

  • Fernet keys: we might have to implement them to at least encrypt user passwords unless we're able to set up oauth (Create a production install of Airflow on tigrsrv #4). Apparently Airflow also allows you to connect with an LDAP service like our CAMH logins.
  • passing in through a config file as we're currently doing
  • use an external storage system like GCP Secrets Manager

From my understanding the encryption of connection keys is separate from securing login to the web application. I think Fernet is meant for the former whereas LDAP/GCP/OAuth is for the latter.

I think ultimately you'd want both since I think securing the web app doesn't stop someone from simply accessing the metadata DB and interacting with the connection object to retrieve plain-text keys. Unless our postgreSQL DB security would prevent this from happening (@DESm1th)?

DESm1th commented 3 years ago

No, it definitely wouldnt prevent that. Plain text passwords / keys in the database are a nono. If a hashed key works for airflow though, we can safely store the hashes

josephmje commented 3 years ago

From my understanding the encryption of connection keys is separate from securing login to the web application. I think Fernet is meant for the former whereas LDAP/GCP/OAuth is for the latter.

Fernet can be applied to both user passwords and credentials. What I meant was we have the option of using it for passwords as well but it looks like @DESm1th is looking into LDAP/OAuth for that.

DESm1th commented 3 years ago

If you want we can use it for passwords too. I think LDAP maaay be less good of an idea than I'd hoped because we may have to work with IMG to set that up

jerdra commented 3 years ago

closing this PR to break it up into its independent components!