department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
79 stars 59 forks source link

Income Limits Automation: Implement running queries and writing to S3 for VES Data Export #16217

Closed FranECross closed 6 months ago

FranECross commented 7 months ago

Description

Finalize implementation of VES data exports to S3 by leveraging work on the linked ticket.

User story

AS AN Income limits owner I WANT the app to be updated automatically, regularly, with new VES zipcode data SO THAT Veterans receive the most up to date information in the UI.

Engineering notes / background

S3 file locations: https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_zipcode.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_state.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_incomethreshold.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_gmtthresholds.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_county.csv

Starter Github Workflow: ```yaml name: Income Limits Data Sync on: workflow_dispatch env: NODE_EXTRA_CA_CERTS: /etc/ssl/certs/ca-certificates.crt jobs: sync: name: Sync Data from VES to S3 runs-on: self-hosted steps: - name: Install Actions uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 - name: Install Java uses: actions/setup-java@0ab4596768b603586c0de567f2430c30f5b0d2b0 #v3.13.0 with: distribution: 'zulu' java-version: '8' - name: Test Java run: java -version - name: Setup the Node environment uses: actions/setup-node@8f152de45cc393bb48ce5d89d36b731f54556e65 # v4.0.0 with: node-version: '18.13' - name: Test Node run: npm --version ```

Analytics considerations

Quality / testing notes

We need to consider monitoring / alarms for failed exports.

Acceptance criteria

FranECross commented 7 months ago

@dsasser I took a swing at editing the other ticket #15203 , as well as creating this one. Please feel free to edit/alter as needed. Thanks! cc @jilladams

dsasser commented 6 months ago

Status Update 12/7/2023

Running into a blocking issue:

Executing the new Workflow in a self-hosted runner (ie: behind the TIC) throws an error when trying to download node dependency node-oracledb:

From this workflow run:

(node:1408632) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.

I also tested a ruby-based action, and received a similar error, ostensibly having the same root cause:

From this workflow run:

Error: Error: write EPROTO 00D8C2A99C7F0000:error:0A000152:SSL routines:final_renegotiate:unsafe legacy renegotiation disabled:../deps/openssl/openssl/ssl/statem/extensions.c:921:

Eric ran into this recently with AP.

Next steps appear to be escalating to VA Gateway Operations.

jilladams commented 6 months ago

From scrum:

TIC (owned by VA Gateway Operations) doesn't allow use of the Node / Ruby versions that Github actions uses by default, which blocks the HTTPS TLS handshake. (required to talk to Oracle (what the VES DB is in)

Lower than latest Node version might work. Lower Ruby version might work. Experimental. Won’t affect big picture versions used anywhere in other code, the versions in question here are isolated to the runner that’s running the Github actions job.

If this doesn’t work, we’re back to not being able to use a Github action. In that case, we would be blocked until network changes can be made to the TIC rules by VA Gateway Operations, to allow legacy TLS operations. OR, would need to use a different system than Github actions.

Possibility in that case: Could potentially run this code in Drupal (like Forms DB migration, or Police data theoretically will someday), but just to fetch the data / send it to S3. We would not want to store these 200,00 records in Drupal. If we get blocked on Github actions in this sprint, we can talk about this idea in more detail during code freeze sprint. It would almost certainly require CMS Collab Cycle review.

dsasser commented 6 months ago

Status Update 12/12/23

Eric via Slack:

The unofficial date for VA Networking operations to ultimately fix this on their end is end of January.

While VA is working on the issue, Eric has been trying alternatives such as using the RBEV installer (which failed).

Next we will try setting up the runner to execute inside a container.

*update:

Running inside a container was successful!

Next we need to resolve how secrets are stored/used, and permission for the S3 bucket, but at this moment we are unblocked.

dsasser commented 6 months ago

Mid-Sprint Update 12/13/23

🟡 This ticket is at risk for getting complete by sprint close. As mentioned in the above comments, we have had several issues getting unblocked with the self-hosted runner installing Ruby/Node. Eric solved that for us yesterday, thankfully, so we are moving forward. There is a fair bit of work remaining, including:

dsasser commented 6 months ago

Update 12/14/2023

We have successfully connected to the VES database within the workflow, and queried all 5 tables! Eric worked his magic and we are unblocked and rolling again on this work.

What remains:

The above represents between 3-5 points of work.

olivereri commented 6 months ago

@dsasser

What remains:

  • Getting the aws cli (AWS command line tool) installed in the runner

I checked on a few other GHA runners and it looks like the base image already comes with AWS CLI already installed:

$ sudo su - runner
$ whoami
runner
$ aws --version
aws-cli/1.22.34 Python/3.10.12 Linux/6.2.0-1016-aws botocore/1.23.34
  • Wiring up the already present authentication to the aws cli

The aws/configure-aws-credentials action claims to make the credentials available to CLI calls: https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#configure-aws-credentials-for-github-actions

  • Copying files to s3

aws s3 cp or aws s3api should pick up the credentials that the above action stores in environment variables.

If you're hitting some errors using AWS CLI, I can take a look.

Edit: Ok I see where this is breaking down:

Run aws s3 sync income_limits_files s3://sitewide-public-websites-income-limits-data
  aws s3 sync income_limits_files s3://sitewide-public-websites-income-limits-data
  shell: sh -e {0}
  env:
    BUCKET_NAME: sitewide-public-websites-income-limits-data
    AWS_REGION: us-gov-west-1
    TEMP_FOLDER: income_limits_files
    AWS_DEFAULT_REGION: us-gov-west-1
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
    AWS_SESSION_TOKEN: ***
    VA_INCOME_LIMITS_VES_DB_USERNAME: ***
    VA_INCOME_LIMITS_VES_DB_PASSWORD: ***
    VA_INCOME_LIMITS_VES_DB_SID: ***
    VA_INCOME_LIMITS_VES_DB_SERVER: ***
    VA_INCOME_LIMITS_VES_DB_PORT: ***
/__w/_temp/46aee[2](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:2)0b-c1ef-48d1-86fc-[3](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:3)0b825c6c[4](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:4)e[5](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:5).sh: 1: aws: not found
Error: Process completed with exit code 12[7](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:7).

Using the absolute path might work /usr/bin/aws

dsasser commented 6 months ago

@olivereri I tried setting the path to the AWS CLI to /usr/bin/aws but I'm getting: /__w/_temp/bab6df03-4a76-4980-8aee-e0e24596d01e.sh: 1: /usr/bin/aws: not found

I'm not sure if this runner just doesn't have the CLI installed, or it is located somewhere mysterious, but I'm going to move forward by installing the CLI manually, which I have previously tested successfully.

dsasser commented 6 months ago

Update 12/18/23

We have successful uploading of CSVs to S3! Here is what remains:

dsasser commented 6 months ago

End of Sprint Update 12/19/23

This issue is likely to roll over the sprint boundary. The work is nearly complete, but will not be merged in time most likely. My previous update captures the work that is yet complete.

Update

Shape of the data looks good, with one change that will need to be addressed in the Income Limits API code: the date format changed from 6/23/2004 9:43:48.000000 AM to 2004-06-23 09:43:48 +0000. This should be a relatively light lift to add to the final cutover ticket.

@FranECross some AC notes

https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_county_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_gmtthresholds_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_incomethreshold_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_state_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_zipcode_temp.csv
jilladams commented 6 months ago

Based on Daniel's notes about final status / ACs, I'm marking this done & closing.

@dsasser you referenced a final cutover ticket and I didn't see that that exists already, so I stubbed out what I think you're saying. It could use a look when you get a chance, from you first, and then from @FranECross : https://github.com/department-of-veterans-affairs/va.gov-cms/issues/16512