CDCgov / prime-reportstream

ReportStream is a public intermediary tool for delivery of data between different parts of the healthcare ecosystem.
https://reportstream.cdc.gov
Creative Commons Zero v1.0 Universal
69 stars 39 forks source link

Evaluate the storage method for Azure blob store connection settings #11565

Open jack-h-wang opened 11 months ago

jack-h-wang commented 11 months ago

User Story

As a ReportStream engineer, I want to follow best practices for storage of sensitive data so that I can be sure sensitive data remains secure.

Description/Use Case

ReportStream stores connection settings to Azure blob stores for both accessing RS specific blob stores as well as connecting to non-RS blob stores when sending pipeline data. We should evaluate whether the storage of these settings is sufficiently secure or if we need to consider migrating these settings to a new location.

Risks/Impacts/Considerations

Dev Notes

Acceptance Criteria

bishoyayoub commented 11 months ago

Hey team! Please add your planning poker estimate with Zenhub @arnejduranovic @jack-h-wang @JessicaWNava @JFU-NAVA-PBC @thetaurean @luis-pabon-tf

bishoyayoub commented 11 months ago

Please add your planning poker estimate with Zenhub @mkalish

Andrey-Glazkv commented 11 months ago

Need to discuss this ticket - @arnejduranovic to schedule a call/discussion on this during sprint 79

JFU-NAVA-PBC commented 8 months ago

Implementation Proposal:

  1. Parameterize all credentials, sensitive info, configuration settings as environment variables
  2. Provision dedicated azure key vault for each environment (different env's vault can have different access policies):
    • PROD
    • STAGING
    • TEST
    • DEMO
    • ... Note: key vaults seem provisioned on for PROD, STAGING, and TEST, DEMO, etc e.g.: phdprod-keyvault, phdprod-appconfig, phdprod-clientconfig phdstaging-keyvault, phdstaging-appconfig, phdstaging-clientconfig phddemo1-keyvault, phddemo1-appconfig, phddemo1-clientconfig

Need to provision similar for LOCAL DEV for same secret / sensitive data access experience.

  1. List of environment variables, per receiver SFTP credentials PPK, PEM already provisioned in azure key vault, here only demo user/password:

    • PROD:

      • RS-ENV-DB-USER = "prime"
      • RS-ENV-DB-PASSWORD = "changeIT!"
      • RS-ENV-DB-URL = "jdbc:postgresql://localhost:5432/prime_data_hub"
      • RS-PRIME-API-BASE-URL = "http://localhost:7071/api"
      • RS-ENV-OAUTH-BASE-URL = "reportstream.oktapreview.com"
      • RS-ENV-OAUTH-CLIENT-ID = "0oa8uvan2i07YXJLk1d7"
      • RS-ENV-OAUTH-REDIRECT = "http://localhost:7071/api/download"
      • RS-ENV-SFTP-HOST = "localhost"
      • RS-ENV-SFTP-PORT = "22"
      • RS-ENV-SFTP-USER = "foo"
      • RS-ENV-SFTP-PASSWORD = "pass"
      • RS-ENV-BLOB-STORAGE-CONN-STR (AzureWebJobsStorage) = "DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=;BlobEndpoint=http://localhost:10000/devstoreaccount1;QueueEndpoint=http://localhost:10001/devstoreaccount1;"
      • RS-ENV-PARTNER-BLOB-STORAGE-CONN-STR (PartnerStorage) = "DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=;BlobEndpoint=http://localhost:10000/devstoreaccount1;QueueEndpoint=http://localhost:10001/devstoreaccount1;" Note: the above list of parameters can be fetched from key vault at the RS server start, there are other credentials for each receiver - e.g. transportation specific credentials for SFTP, REST, SOAP which is lazy fetched when the receiver is processed at report processing / routing / delivery time, the lazy fetched credentials are still accessed through the AzureSecretService.
    • STAGING:

      • Similar to above PROD but can have ENV specifics
    • TEST:

      • Similar but can have ENV specifics
  2. Store the parameters in the vault for each environment (DevSecOps help)
    • Initially manual provision
      • Can be automated later
    • For configuration settings use compression + base64 encoding to overcome the value size limit
  3. Provision vault access policies (DevSecOps help)
    • AppDev users - editor to TEST, STAGING
    • DevOps users - editor to PROD, STAGING, TEST
  4. RS Runtime parameter fetching:
    • Run az login for AppDev users assuming user RBAC rule setup
    • After successful login
    • Run environment variable setup (bash script) leveraging az keyvault secret CLI (details at below link)
    • AZ VAULT SECRET CLI
  5. Run the usual RS start up scripts
  6. Next
  7. Remove sensitive data from repo, e.g. RS_OKTA_clientId, even for staging.

./docs/docs-deprecated/getting-started/Using-an-apple-silicon-mac.md:RS_OKTA_clientId=0oa8uvan2i07YXJLk1d7 ./docker-compose.yml: - RS_OKTA_clientId=0oa8uvan2i07YXJLk1d7

OAUTH2 client ID is considered as app credential, like user name in user name / password

Report Stream OAUTH2 uses a flow that is OK between Parteners (STLTs), so it is good practice to keep it between parteners, showing up in public repo is open it up to general github users.

Note,

External SFTP settings in PROD/STAGING involve sender, receiver on boarding process as described in:

./docs/onboarding-users/transport/sftp.md

For this POC, the sensitive info is manually provisioned in key vault:

image.png

PROTOTYPE done, pushed the PR (no merge POC only):

smoke tests:

image.png

STEPS TO RUN THE PROTOTYPE:

prerequisite: provision Azure Account with Key Vault

  1. Check out the PR
  2. cd /prime-router
  3. az login
  4. source ./source_rs_env.sh, this will fetch all RS server needed credentials, sensitive info, settings from AZ Key Vault
  5. run ./cleanslate.sh
  6. Wait azure fully started
  7. I am on Apple chip so need to continue below steps
  8. Open another terminal
  9. cd /prime-router
  10. source ./source_rs_env.sh
  11. run ./gradlew reloadTables
  12. run ./gradlew reloadSetting
  13. run ./gradlew testSmoke
  14. an other tests

Check the source code for the following:

SFTP credentials are no longer fetched from Hashicorp API, but just from ENV vars (which come from AZ Key Vault) This also apply to other credentials including AZ Storage connection string

As a result:

Hashicorp vault and Hashicorp secret service code can be removed.

AzureSecretService might need to stay for per receiver transportation credentials access (since these per receiver creds are not ppopulated as ENV vars as in step 4, they are fetched at run time.

JFU-NAVA-PBC commented 7 months ago

Move review feedbacks and conversations here from PR (which is POC and will closed and deleted after research done).

=========== PR review feedbacks: I think this is an interesting idea, though I'm not sure this is 100% the right path.

Some thoughts:

there is the intention that this be an open source project, so we need to avoid adding local configuration that would prevent anyone from cloning and working on this repo For local dev, I'm not seeing why we would need to fetch the credentials from azure rather than using the local vault implementation (this is how it currently works for fetching the blob transport params) I think we need to work more closely with devops since ultimately where these credentials live is there domain. Taking a look in azure, it looks like value like the DB password and blob connection string exist as configuration on the function app I think the change in SftpTransport is actually undoing a working version of what we want to see where we dynamically load the credential out of the vault My read on the scope of this work was to work with the devops team to see if it could make sense to make sensitive values no longer be an app config setting and fetch them out the existing azure vault, whether that be via a change to the application code or as a change in how the azure function works (i.e. using managed identities rather than passwords.)

Yea, the research went beyond the original ask.

below are the considerations:

"there is the intention that this be an open source project, so we need to avoid adding local configuration that would prevent anyone from cloning and working on this repo"

JF: there are OAUTH client IDs (credentials) for prod and staging in the repo, they should be relocated to vaults, the community developers should plug in their environment specific OAUTH creds instead of report streams

"I think we need to work more closely with devops since ultimately where these credentials live is there domain. Taking a look in azure, it looks like value like the DB password and blob connection string exist as configuration on the function app"

JF: this is a original AC, and sure I can proceed on that, I got access to the vaults recently so I can check that

" I think the change in SftpTransport is actually undoing a working version of what we want to see where we dynamically load the credential out of the vault"

JF: So this is something broken? could you point me to the code? so a bug can be logged to track

"My read on the scope of this work was to work with the devops team to see if it could make sense to make sensitive values no longer be an app config setting and fetch them out the existing azure vault, whether that be via a change to the application code or as a change in how the azure function works (i.e. using managed identities rather than passwords.) "

JF: will checkout the contents of current appconfig, and keyvaults, and BTW, pdhstaging-appconfig and pdhstaging-keyvault are all Azure Key Vaults, any value for moving blob conn str from one to the other?

arnejduranovic commented 7 months ago

See this PR for a proof of concept and conversation that is tangential to this ticket but does not meet the AC: https://github.com/CDCgov/prime-reportstream/pull/12760

Clarification from Michael on what work this ticket should consist of:

work with the devops team to see if it could make sense to make sensitive values no longer be an app config setting and fetch them out the existing azure vault, whether that be via a change to the application code or as a change in how the azure function works (i.e. using managed identities rather than passwords.)