catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Nightly Build Failure 2024-02-06 #3364

Closed zaneselvans closed 4 months ago

zaneselvans commented 4 months ago

Overview

It looks like an error in pulling data from GCS, caught by Google via a checksum mismatch.

Next steps

What next steps do we need to do to understand or remediate the issue?

Verify that everything is fixed!

Once you've applied any necessary fixes, make sure that the nightly build outputs are all in their right places.

- [ ] [S3 distribution bucket](https://s3.console.aws.amazon.com/s3/buckets/pudl.catalyst.coop?region=us-west-2&bucketType=general&prefix=nightly/&showversions=false) was updated at the expected time
- [ ] [GCP distribution bucket](https://console.cloud.google.com/storage/browser/pudl.catalyst.coop/nightly;tab=objects?project=catalyst-cooperative-pudl) was updated at the expected time
- [ ] [GCP internal bucket](https://console.cloud.google.com/storage/browser/builds.catalyst.coop) was updated at the expected time
- [ ] [Datasette PUDL version](https://data.catalyst.coop/pudl/core_pudl__codes_datasources) points at the same hash as [nightly](https://github.com/catalyst-cooperative/pudl/tree/nightly)
- [ ] [Zenodo sandbox record](https://sandbox.zenodo.org/doi/10.5072/zenodo.5563) was updated to the record number in the logs (search for `zenodo_data_release.py` and `Draft` in the logs, to see what the new record number should be!)

Relevant logs

[link to build logs from internal distribution bucket]( PLEASE FIND THE ACTUAL LINK AND FILL IN HERE )

2024-02-06 07:01:42 +0000 - dagster - ERROR - etl_job - eb4af868-6b94-4a47-b66a-13810f197ec1 - 1104 - RUN_FAILURE - Execution of run for "etl_job" failed. Steps failed: ['raw_eia923__all_dfs.extract_single_eia923_year[2003]'].
Traceback (most recent call last):
  File "/home/mambauser/env/bin/pudl_etl", line 8, in <module>
    sys.exit(pudl_etl())
             ^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/etl/cli.py", line 184, in pudl_etl
    raise Exception(event.event_specific_data.error)
Exception: dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "raw_eia923__all_dfs.extract_single_eia923_year":

Stack Trace:
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_plan.py", line 286, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_step.py", line 487, in core_dagster_event_sequence_for_step
    for user_event in _step_output_error_checked_user_event_sequence(
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_step.py", line 169, in _step_output_error_checked_user_event_sequence
    for user_event in user_event_sequence:
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_step.py", line 95, in _process_asset_results_to_events
    for user_event in user_event_sequence:
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/compute.py", line 212, in execute_core_compute
    for step_output in _yield_compute_results(step_context, inputs, compute_fn, compute_context):
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/compute.py", line 181, in _yield_compute_results
    for event in iterate_with_context(
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_utils/__init__.py", line 465, in iterate_with_context
    with context_fn():
  File "/home/mambauser/env/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/utils.py", line 84, in op_execution_error_boundary
    raise error_cls(

The above exception was caused by the following exception:
google.resumable_media.common.DataCorruption: Checksum mismatch while downloading:

  https://storage.googleapis.com/download/storage/v1/b/internal-zenodo-cache.catalyst.coop/o/eia923%2F10.5281-zenodo.10067550%2Feia923-2003.zip?alt=media&userProject=catalyst-cooperative-pudl

The X-Goog-Hash header indicated an MD5 checksum of:

  S9fhAlyRwAtQts74fLm/rQ==

but the actual MD5 checksum of the downloaded contents was:

  68rwjEX+3bRiffrmBpVvqw==

Stack Trace:
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_utils/__init__.py", line 467, in iterate_with_context
    next_output = next(iterator)
                  ^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/compute_generator.py", line 131, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
             ^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/dagster/_core/execution/plan/compute_generator.py", line 125, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/extract/excel.py", line 466, in extract_single_year
    return extractor_cls(ds).extract(year=[year])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/extract/excel.py", line 302, in extract
    self.load_excel_file(page, **partition),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/extract/excel.py", line 376, in load_excel_file
    zf = self.ds.get_zipfile_resource(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/workspace/datastore.py", line 413, in get_zipfile_resource
    return zipfile.ZipFile(io.BytesIO(self.get_unique_resource(dataset, **filters)))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/workspace/datastore.py", line 402, in get_unique_resource
    _, content = next(res)
                 ^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/workspace/datastore.py", line 382, in get_resources
    contents = self._cache.get(res)
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/workspace/resource_cache.py", line 207, in get
    return cache.get(resource)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/workspace/resource_cache.py", line 152, in get
    return self._blob(resource).download_as_bytes(retry=gcs_retry)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/google/cloud/storage/blob.py", line 1405, in download_as_bytes
    self._prep_and_do_download(
  File "/home/mambauser/env/lib/python3.11/site-packages/google/cloud/storage/blob.py", line 4332, in _prep_and_do_download
    self._do_download(
  File "/home/mambauser/env/lib/python3.11/site-packages/google/cloud/storage/blob.py", line 987, in _do_download
    response = download.consume(transport, timeout=timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/google/resumable_media/requests/download.py", line 237, in consume
    return _request_helpers.wait_and_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/google/resumable_media/requests/_request_helpers.py", line 155, in wait_and_retry
    response = func()
               ^^^^^^
  File "/home/mambauser/env/lib/python3.11/site-packages/google/resumable_media/requests/download.py", line 233, in retriable_request
    self._write_to_stream(result)
  File "/home/mambauser/env/lib/python3.11/site-packages/google/resumable_media/requests/download.py", line 141, in _write_to_stream
    raise common.DataCorruption(response, msg)

The above exception occurred during handling of the following exception:
KeyError: "No resources found for eia923: {'name': 'f906920_2003.xls'}"

Stack Trace:
  File "/home/mambauser/pudl/src/pudl/extract/excel.py", line 371, in load_excel_file
    res = self.ds.get_unique_resource(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/src/pudl/workspace/datastore.py", line 404, in get_unique_resource
    raise KeyError(f"No resources found for {dataset}: {filters}") from err

The above exception was caused by the following exception:
StopIteration

Stack Trace:
  File "/home/mambauser/pudl/src/pudl/workspace/datastore.py", line 402, in get_unique_resource
    _, content = next(res)
                 ^^^^^^^^^

Copying outputs to GCP bucket gs://builds.catalyst.coop/2024-02-06-0602-a1c96992c-main
Copying file:///home/mambauser/pudl_work/output/ferc2_xbrl_taxonomy_metadata.json [Content-Type=application/json]...
/ [0/22 files][    0.0 B/  7.1 GiB]   0% Done                                   
Copying file:///home/mambauser/pudl_work/output/ferc2_xbrl_datapackage.json [Content-Type=application/json]...
jdangerx commented 4 months ago

Sounds like we should re-run that workflow and see if it succeeds - I can do that.

bendnorman commented 4 months ago

Looks like the action rerun failed because we've already created a for nightly-2024-02-06. I'm trying to rerun it by creating a duplicate of last night's Batch job.

bendnorman commented 4 months ago

Ah that didn't work. All of the credentials still need to get passed to the VM from the action.

bendnorman commented 4 months ago

Was able to rerun the workflow using the nightly-2024-02-06 tag.

jdangerx commented 4 months ago

Ah! Cool, thanks for digging in! I wonder if the trigger will still count as "scheduled"? And if not, I don't think our distribution code will run - let's see what happens!