cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
48 stars 13 forks source link

Bug: Cannot pull Littlepay data for Monterey-Salinas Transit #3458

Closed ohrite closed 2 months ago

ohrite commented 2 months ago

Describe the bug The sync_littlepay job in Composer has started failing as of the Sunday @ 11pm Pacific (2024-09-16Z06:00:00) run. In this case, MST is Monterey-Salinas Transit.

The following log entries are visible:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: User: arn:aws:iam::xyz:user/system/mst-default is not authorized to perform: s3:ListBucket on resource: "arn:aws:s3:::littlepay-prod-mst-datafeed" with an explicit deny in an identity-based policy
[2024-09-16, 06:00:37 UTC] {taskinstance.py:1328} INFO - Marking task as FAILED. dag_id=sync_littlepay, task_id=mst, execution_date=20240916T050000, start_date=20240916T060034, end_date=20240916T060037
[2024-09-16, 06:00:37 UTC] {standard_task_runner.py:100} ERROR - Failed to execute job 1780217 for task mst (An error occurred (AccessDenied) when calling the ListObjectsV2 operation: User: arn:aws:iam::xyz:user/system/mst-default is not authorized to perform: s3:ListBucket on resource: "arn:aws:s3:::littlepay-prod-mst-datafeed" with an explicit deny in an identity-based policy; 488753)

To Reproduce After configuring aws CLI with the key id/secret pair from Google Secrets Manager, running the following command reproduces the error:

$ aws aws s3 ls littlepay-prod-mst-datafeed --user-name mst-default --profile mst

Expected behavior The log entry does not appear and synchronization is able to finish for Monterey-Salinas Transit.

Additional context The following runbook applies to this situation: https://github.com/cal-itp/data-infra/blob/main/runbooks/workflow/creating-maintaining-littlepay-data-syncs.md

ohrite commented 2 months ago

While attempting to rotate keys using the steps in the runbook, this command failed:

$ aws iam create-access-key --user-name cal-itp-default --profile cal-itp

An error occurred (AccessDenied) when calling the CreateAccessKey operation: User: arn:aws:iam::xxx:user/system/cal-itp-default is not authorized to perform: iam:CreateAccessKey on resource: user cal-itp-default with an explicit deny in an identity-based policy
ohrite commented 2 months ago

Symptoms

On 9/15/2024 at 6am UTC, the sync_littlepay job failed to copy Littlepay logs from S3, and is continuing to fail.

Root Cause

Littlepay recommends rotating agency-specific AWS credentials every 90 days. The Littlepay AWS credentials currently in use were created prior to April. After discussion with @vevetron, we suspect the credentials have expired, and we are no longer able to regenerate new access key/id pairs for rotation.

Frequency

The credentials will continue to expire at some point after each 90 day window, as designed.

Mitigations

  1. @vevetron has emailed Littlepay to ask for a new set of Key ID/Secret pairs, which we can use to bootstrap a new set of credentials.
  2. We can develop a script to automate the task of regenerating and rotating credentials
  3. We can discuss strategies for scheduling automated runs of that script
ohrite commented 2 months ago

The feeds have since begun working again. The following email was sent by Littlepay in response:

Please can I ask you to try to access data feeds again? This is an issue that appears to have affected multiple merchants, and a fix has been launched. If you could please let me know if this fix is working for you.

evansiroky commented 2 months ago

Airflow jobs aren't failing, can be marked completed