Closed FranECross closed 6 months ago
@dsasser I took a swing at editing the other ticket #15203 , as well as creating this one. Please feel free to edit/alter as needed. Thanks! cc @jilladams
Running into a blocking issue:
Executing the new Workflow in a self-hosted runner (ie: behind the TIC) throws an error when trying to download node dependency node-oracledb:
From this workflow run:
(node:1408632) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
I also tested a ruby-based action, and received a similar error, ostensibly having the same root cause:
From this workflow run:
Error: Error: write EPROTO 00D8C2A99C7F0000:error:0A000152:SSL routines:final_renegotiate:unsafe legacy renegotiation disabled:../deps/openssl/openssl/ssl/statem/extensions.c:921:
Eric ran into this recently with AP.
Next steps appear to be escalating to VA Gateway Operations.
From scrum:
TIC (owned by VA Gateway Operations) doesn't allow use of the Node / Ruby versions that Github actions uses by default, which blocks the HTTPS TLS handshake. (required to talk to Oracle (what the VES DB is in)
Lower than latest Node version might work. Lower Ruby version might work. Experimental. Won’t affect big picture versions used anywhere in other code, the versions in question here are isolated to the runner that’s running the Github actions job.
If this doesn’t work, we’re back to not being able to use a Github action. In that case, we would be blocked until network changes can be made to the TIC rules by VA Gateway Operations, to allow legacy TLS operations. OR, would need to use a different system than Github actions.
Possibility in that case: Could potentially run this code in Drupal (like Forms DB migration, or Police data theoretically will someday), but just to fetch the data / send it to S3. We would not want to store these 200,00 records in Drupal. If we get blocked on Github actions in this sprint, we can talk about this idea in more detail during code freeze sprint. It would almost certainly require CMS Collab Cycle review.
Eric via Slack:
The unofficial date for VA Networking operations to ultimately fix this on their end is end of January.
While VA is working on the issue, Eric has been trying alternatives such as using the RBEV installer (which failed).
Next we will try setting up the runner to execute inside a container.
*update:
Running inside a container was successful!
Next we need to resolve how secrets are stored/used, and permission for the S3 bucket, but at this moment we are unblocked.
🟡 This ticket is at risk for getting complete by sprint close. As mentioned in the above comments, we have had several issues getting unblocked with the self-hosted runner installing Ruby/Node. Eric solved that for us yesterday, thankfully, so we are moving forward. There is a fair bit of work remaining, including:
We have successfully connected to the VES database within the workflow, and queried all 5 tables! Eric worked his magic and we are unblocked and rolling again on this work.
What remains:
The above represents between 3-5 points of work.
@dsasser
What remains:
- Getting the aws cli (AWS command line tool) installed in the runner
I checked on a few other GHA runners and it looks like the base image already comes with AWS CLI already installed:
$ sudo su - runner
$ whoami
runner
$ aws --version
aws-cli/1.22.34 Python/3.10.12 Linux/6.2.0-1016-aws botocore/1.23.34
- Wiring up the already present authentication to the aws cli
The aws/configure-aws-credentials action claims to make the credentials available to CLI calls: https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#configure-aws-credentials-for-github-actions
- Copying files to s3
aws s3 cp
or aws s3api
should pick up the credentials that the above action stores in environment variables.
If you're hitting some errors using AWS CLI, I can take a look.
Edit: Ok I see where this is breaking down:
Run aws s3 sync income_limits_files s3://sitewide-public-websites-income-limits-data
aws s3 sync income_limits_files s3://sitewide-public-websites-income-limits-data
shell: sh -e {0}
env:
BUCKET_NAME: sitewide-public-websites-income-limits-data
AWS_REGION: us-gov-west-1
TEMP_FOLDER: income_limits_files
AWS_DEFAULT_REGION: us-gov-west-1
AWS_ACCESS_KEY_ID: ***
AWS_SECRET_ACCESS_KEY: ***
AWS_SESSION_TOKEN: ***
VA_INCOME_LIMITS_VES_DB_USERNAME: ***
VA_INCOME_LIMITS_VES_DB_PASSWORD: ***
VA_INCOME_LIMITS_VES_DB_SID: ***
VA_INCOME_LIMITS_VES_DB_SERVER: ***
VA_INCOME_LIMITS_VES_DB_PORT: ***
/__w/_temp/46aee[2](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:2)0b-c1ef-48d1-86fc-[3](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:3)0b825c6c[4](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:4)e[5](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:5).sh: 1: aws: not found
Error: Process completed with exit code 12[7](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:7).
Using the absolute path might work /usr/bin/aws
@olivereri I tried setting the path to the AWS CLI to /usr/bin/aws but I'm getting: /__w/_temp/bab6df03-4a76-4980-8aee-e0e24596d01e.sh: 1: /usr/bin/aws: not found
I'm not sure if this runner just doesn't have the CLI installed, or it is located somewhere mysterious, but I'm going to move forward by installing the CLI manually, which I have previously tested successfully.
We have successful uploading of CSVs to S3! Here is what remains:
This issue is likely to roll over the sprint boundary. The work is nearly complete, but will not be merged in time most likely. My previous update captures the work that is yet complete.
Shape of the data looks good, with one change that will need to be addressed in the Income Limits API code: the date format changed from 6/23/2004 9:43:48.000000 AM
to 2004-06-23 09:43:48 +0000
. This should be a relatively light lift to add to the final cutover ticket.
@FranECross some AC notes
[ ] Place new files in a subfolder in the S3 bucket
This didn't end up needing a subfolder. Instead the files were uploaded with slightly different filenames. I added _temp
to the filename such that, for instance, the result is std_state_temp.csv
.
[ ] Point reviewer and confirm they look good New, temporary CSVs are here:
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_county_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_gmtthresholds_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_incomethreshold_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_state_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_zipcode_temp.csv
Based on Daniel's notes about final status / ACs, I'm marking this done & closing.
@dsasser you referenced a final cutover ticket and I didn't see that that exists already, so I stubbed out what I think you're saying. It could use a look when you get a chance, from you first, and then from @FranECross : https://github.com/department-of-veterans-affairs/va.gov-cms/issues/16512
Description
Finalize implementation of VES data exports to S3 by leveraging work on the linked ticket.
User story
AS AN Income limits owner I WANT the app to be updated automatically, regularly, with new VES zipcode data SO THAT Veterans receive the most up to date information in the UI.
Engineering notes / background
S3 file locations: https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_zipcode.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_state.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_incomethreshold.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_gmtthresholds.csv https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_county.csv
Starter Github Workflow:
```yaml name: Income Limits Data Sync on: workflow_dispatch env: NODE_EXTRA_CA_CERTS: /etc/ssl/certs/ca-certificates.crt jobs: sync: name: Sync Data from VES to S3 runs-on: self-hosted steps: - name: Install Actions uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1 - name: Install Java uses: actions/setup-java@0ab4596768b603586c0de567f2430c30f5b0d2b0 #v3.13.0 with: distribution: 'zulu' java-version: '8' - name: Test Java run: java -version - name: Setup the Node environment uses: actions/setup-node@8f152de45cc393bb48ce5d89d36b731f54556e65 # v4.0.0 with: node-version: '18.13' - name: Test Node run: npm --version ```Analytics considerations
Quality / testing notes
We need to consider monitoring / alarms for failed exports.
Acceptance criteria