catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Nightly Build Failure 2024-04-11 #3563

Closed zaneselvans closed 2 months ago

zaneselvans commented 2 months ago

Overview

Everything except the creation of the Zenodo sandbox data release archive succeeded. That step fails with a Pydanic validation error when attempting to create a _NewRecord from the JSON that comes from a Zenodo response. It looks like it's getting a server error instead of the expected response, so my guess is this is a problem that originates with Zenodo.

This error first appeared on 2024-04-11, but recurred the next time the nightly builds ran, on 2024-04-14.

Next steps

import requests
import os
import json

token = os.environ["ZENODO_SANDBOX_TOKEN_PUBLISH"]
dude = requests.request(
    method="GET",
    url="https://sandbox.zenodo.org/api/records/5563",
    headers={"Authorization": f"Bearer {token}"},
    timeout=5,
)
print(json.dumps(dude.json(), indent=4))
{
    "error_id": "35b0d52720334872a037327c26439814",
    "message": "The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.",
    "status": 500
}

However, trying the same thing on the production server seems to work fine, so this seems to be an issue with the Zenodo sandbox server.

import requests
import os
import json

#token = os.environ["ZENODO_SANDBOX_TOKEN_PUBLISH"]
#record_id = 5563
#base_url = "https://sandbox.zenodo.org/api"

token = os.environ["ZENODO_TOKEN_PUBLISH"]
record_id = 3653158
base_url = "https://zenodo.org/api"

dude = requests.request(
    method="GET",
    url=f"{base_url}/records/{record_id}",
    headers={"Authorization": f"Bearer {token}"},
    timeout=5,
)
print(json.dumps(dude.json(), indent=4))
{
    "created": "2024-02-26T17:59:12.569661+00:00",
    "modified": "2024-02-26T17:59:13.853168+00:00",
    "id": 10708669,
    "conceptrecid": "3653158",
    "doi": "10.5281/zenodo.10708669",
    "conceptdoi": "10.5281/zenodo.3653158",
    "doi_url": "https://doi.org/10.5281/zenodo.10708669",
    "metadata": {
        "title": "Public Utility Data Liberation Project (PUDL) Data Release",
        "doi": "10.5281/zenodo.10708669",
        "publication_date": "2024-02-26",
        "description": "<h1>PUDL v2024.2.6 Data Release</h1>\n<p>The main impetus behind this release is the quarterly update of some of ourcore datasets with preliminary data for 2023Q4. The <a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia860.html\"><span>EIA Form 860 &ndash; Annual Electric Generator Report</span></a>, <a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/epacems.html\"><span>EPA Hourly Continuous Emission Monitoring System (CEMS)</span></a>, and bulk EIA API data are all up to date through the end of 2023, while the <a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_sources/eia923.html\"><span>EIA Form 923 &ndash; Power Plant Operations Report</span></a> lags a month behind and is currently only available through November, 2023. We also addressed several issues we found in our initial release automation process that will make it easier for us to do more frequent releases, like this one!</p>\n<p>We&rsquo;re also for the first time publishing the full historical time series of of generator data available in the EIA860M, rather than just using the most recent release to update the EIA860 outputs. This enables tracking of how planned fossil plant retirement dates have evolved over time.</p>\n<p>There are also updates to our data validation system, a new version of Pandas, and experimental Parquet outputs. See below for the details.</p>\n\n\n<h3>New Data Coverage</h3>\n\n\n\n<ul>\n<li>Add EIA860M data through December 2023 <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3313\">#3313</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3367\">#3367</a>.</li>\n<li>Add 2023 Q4 of CEMS data. See <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3315\">#3315</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3379\">#3379</a>.</li>\n<li>Add EIA923 monthly data through November 2023 <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3314\">#3314</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3398\">#3398</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3422\">#3422</a>.</li>\n<li>Create a new table <a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_dictionaries/pudl_db.html#core-eia860m-changelog-generators\"><span>core_eia860m__changelog_generators</span></a> which tracks the evolution of all generator data reported in the EIA860M, in particular the stated retirement dates. see issue <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3330\">#3330</a> and PR <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3331\">#3331</a>. Previously only the most recent month of reported EIA860M data was available within the PUDL DB.</li>\n</ul>\n\n\n\n<h3>Release Infrastructure</h3>\n\n\n\n<ul>\n<li>Use the same logic to merge version tags into the <code><span>stable</span></code> branch as we are using to merge the nightly build tags into the <code><span>nightly</span></code> branch. See PR <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3347\">#3347</a></li>\n<li>Automatically place a <a href=\"https://cloud.google.com/storage/docs/holding-objects#use-object-holds\">temporary object hold</a> on all versioned data releases that we publish to GCS, to ensure that they can&rsquo;t be accidentally deleted. See issue <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3400\">#3400</a> and PR <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3421\">#3421</a>.</li>\n</ul>\n\n\n\n<h3>Schema Changes</h3>\n\n\n\n<p>Restored the individual FERC Form 1 plant output tables, providing direct access to denormalized versions of the specific plant types via:</p>\n<ul>\n<li><a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_dictionaries/pudl_db.html#out-ferc1-yearly-steam-plants-sched402\"><span>out_ferc1__yearly_steam_plants_sched402</span></a></li>\n<li><a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_dictionaries/pudl_db.html#out-ferc1-yearly-small-plants-sched410\"><span>out_ferc1__yearly_small_plants_sched410</span></a></li>\n<li><a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_dictionaries/pudl_db.html#out-ferc1-yearly-hydroelectric-plants-sched406\"><span>out_ferc1__yearly_hydroelectric_plants_sched406</span></a></li>\n<li><a href=\"https://catalystcoop-pudl.readthedocs.io/en/nightly/data_dictionaries/pudl_db.html#out-ferc1-yearly-pumped-storage-plants-sched408\"><span>out_ferc1__yearly_pumped_storage_plants_sched408</span></a></li>\n</ul>\n<p>See issue <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3416\">#3416</a> &amp; PR <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3417\">#3417</a></p>\n\n\n\n<h3>Data Validation with Pandera</h3>\n\n\n\n<p>We&rsquo;ve started integrating <code><span>pandera</span></code> dataframe schemas and checks with <code><span>dagster</span></code> <a href=\"https://docs.dagster.io/concepts/assets/asset-checks\">asset checks</a> to validate data while our ETL pipeline is running instead of only after all the data has been produced. Initially we are using the various database schema checks that are generated by our metadata, but the goal is to migrate all of our data validation tests into this framework over time, and to start using it to encode any new data validations immediately. See issues <a href=\"https://github.com/catalyst-cooperative/pudl/issues/941\">#941</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/issues/1572\">#1572</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3318\">#3318</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3412\">#3412</a> and PR <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3282\">#3282</a>.</p>\n\n\n\n<h3>Pandas 2.2</h3>\n\n\n\n<p>We&rsquo;ve updated to Pandas 2.2, which has a number of changes and deprecations. See PRs <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3272\">#3272</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3410\">#3410</a>.</p>\n<ul>\n<li>Changes in <a href=\"https://pandas.pydata.org/pandas-docs/stable/whatsnew/v2.2.0.html#merge-and-dataframe-join-now-consistently-follow-documented-sort-behavior\">how merge results are sorted</a> impacted the assignment of <code><span>unit_id_pudl</span></code> values, so any hard-coded values that dependent on the previous assignments will likely be incorrect now. We had to update a number of tests and FERC1-EIA record linkage training data to account for this change.</li>\n<li>Pandas is also deprecating the use of the <code><span>AS</span></code> frequency alias, in favor of <code><span>YS</span></code>, so many references to the old alias have been updated.</li>\n<li>We&rsquo;ve switched to using the <code><span>calamine</span></code> engine for reading Excel files, which is much faster than the old <code><span>openpyxl</span></code> library.</li>\n</ul>\n\n\n\n<h3>Parquet Outputs</h3>\n\n<p>The ETL now outputs PyArrow Parquet files for all tables that are written to the PUDL DB. The Parquet outputs are used as the interim storage for the ETL, rather than reading all tables out of the SQLite DB. We aren&rsquo;t publicly distributing the Parquet outputs yet, but are giving them a test run with some existing users. See <a href=\"https://github.com/catalyst-cooperative/pudl/issues/3102\">#3102</a> <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3296\">#3296</a>, <a href=\"https://github.com/catalyst-cooperative/pudl/pull/3399\">#3399</a>.</p>\n<h2>Other PUDL v2024.2.6 Resources</h2>\n<ul>\n<li><a href=\"https://catalystcoop-pudl.readthedocs.io/en/v2024.2.6/data_dictionaries/pudl_db.html\">PUDL v2024.2.6 Data Dictionary</a></li>\n<li><a href=\"https://catalystcoop-pudl.readthedocs.io/en/v2024.2.6/\">PUDL v2024.2.6 Documentation</a></li>\n<li><a href=\"https://registry.opendata.aws/catalyst-cooperative-pudl/\">PUDL in the AWS Open Data Registry</a></li>\n<li>PUDL v2024.2.6 in a free, public AWS S3 bucket: s3://pudl.catalyst.coop/v2024.2.6/</li>\n<li>PUDL v2024.2.6 in a requester-pays GCS bucket: gs://pudl.catalyst.coop/v2024.2.6/</li>\n<li><a href=\"https://doi.org/10.5281/zenodo.10703402\">Zenodo archive of the PUDL GitHub repo for this release</a></li>\n<li><a href=\"https://github.com/catalyst-cooperative/pudl/releases/tag/v2024.2.6\">PUDL v2024.2.6 release on GitHub</a></li>\n<li><a href=\"https://pypi.org/project/catalystcoop.pudl/2024.2.6/\">PUDL v2024.2.6 package in the Python Package Index (PyPI)</a></li>\n</ul>\n<h2>Contact Us</h2>\n<p><strong>If you're using PUDL, we would love to hear from you!</strong> Even if it's just a note to let us know that you exist, and how you're using the software or data. Here's a bunch of different ways to get in touch:</p>\n<ul>\n<li><a href=\"https://github.com/catalyst-cooperative\">Follow us on GitHub</a></li>\n<li>Use the <a href=\"https://github.com/catalyst-cooperative/pudl/issues\">PUDL Github issue tracker</a> to let us know about any bugs or data issues you encounter</li>\n<li><a href=\"https://github.com/orgs/catalyst-cooperative/discussions\">GitHub Discussions</a> is where we provide user support.</li>\n<li>Watch our <a href=\"https://github.com/orgs/catalyst-cooperative/projects/9\">GitHub Project</a> to see what we're working on.</li>\n<li>Email us at <a href=\"mailto:hello@catalyst.coop\">hello@catalyst.coop</a> for private communications.</li>\n<li>On Mastodon: <a href=\"https://mastodon.energy/@catalystcoop\">@CatalystCoop@mastodon.energy</a></li>\n<li>On BlueSky: <a href=\"https://bsky.app/profile/catalyst.coop\">@catalyst.coop</a></li>\n<li>On Twitter: <a href=\"https://twitter.com/CatalystCoop\">@CatalystCoop</a></li>\n<li>Play with our data and notebooks <a href=\"https://www.kaggle.com/catalystcooperative\">on Kaggle</a></li>\n<li>Combine our data with ML models <a href=\"https://huggingface.co/catalystcooperative\">on HuggingFace</a></li>\n<li>Learn more about us on our website: <a href=\"https://catalyst.coop\">https://catalyst.coop</a></li>\n<li>Subscribe to our announcements list for <a href=\"https://catalyst.coop/updates\">email updates</a>.</li>\n</ul>",
        "access_right": "open",
        "creators": [
            {
                "name": "Selvans, Zane A.",
                "affiliation": "Catalyst Cooperative",
                "orcid": "0000-0002-9961-7208"
            },
            {
                "name": "Gosnell, Christina M.",
                "affiliation": "Catalyst Cooperative",
                "orcid": "0009-0004-2979-6142"
            },
            {
                "name": "Sharpe, Austen",
                "affiliation": "Catalyst Cooperative"
            },
            {
                "name": "Norman, Bennett",
                "affiliation": "Catalyst Cooperative"
            },
            {
                "name": "Bush, Trenton",
                "affiliation": "Catalyst Cooperative"
            },
            {
                "name": "Schira, Zach",
                "affiliation": "Catalyst Cooperative"
            },
            {
                "name": "Lamb, Katherine",
                "affiliation": "Catalyst Cooperative"
            },
            {
                "name": "Xia, Dazhong",
                "affiliation": "Catalyst Cooperative"
            },
            {
                "name": "Belfer, Ella",
                "affiliation": "Catalyst Cooperative"
            }
        ],
        "keywords": [
            "utility",
            "energy",
            "climate",
            "sqlite",
            "parquet",
            "electricity",
            "emissions",
            "epa",
            "eia",
            "ferc",
            "coal",
            "natural gas",
            "regulation",
            "policy",
            "data",
            "eia923",
            "eia860",
            "eia861",
            "ferc form 1"
        ],
        "version": "v2024.2.6",
        "language": "eng",
        "resource_type": {
            "title": "Dataset",
            "type": "dataset"
        },
        "license": {
            "id": "cc-by-4.0"
        },
        "communities": [
            {
                "id": "catalyst-cooperative"
            }
        ],
        "relations": {
            "version": [
                {
                    "index": 8,
                    "is_last": true,
                    "parent": {
                        "pid_type": "recid",
                        "pid_value": "3653158"
                    }
                }
            ]
        }
    },
    "title": "Public Utility Data Liberation Project (PUDL) Data Release",
    "links": {
        "self": "https://zenodo.org/api/records/10708669",
        "self_html": "https://zenodo.org/records/10708669",
        "self_doi": "https://zenodo.org/doi/10.5281/zenodo.10708669",
        "doi": "https://doi.org/10.5281/zenodo.10708669",
        "parent": "https://zenodo.org/api/records/3653158",
        "parent_html": "https://zenodo.org/records/3653158",
        "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.3653158",
        "self_iiif_manifest": "https://zenodo.org/api/iiif/record:10708669/manifest",
        "self_iiif_sequence": "https://zenodo.org/api/iiif/record:10708669/sequence/default",
        "files": "https://zenodo.org/api/records/10708669/files",
        "media_files": "https://zenodo.org/api/records/10708669/media-files",
        "archive": "https://zenodo.org/api/records/10708669/files-archive",
        "archive_media": "https://zenodo.org/api/records/10708669/media-files-archive",
        "latest": "https://zenodo.org/api/records/10708669/versions/latest",
        "latest_html": "https://zenodo.org/records/10708669/latest",
        "draft": "https://zenodo.org/api/records/10708669/draft",
        "versions": "https://zenodo.org/api/records/10708669/versions",
        "access_links": "https://zenodo.org/api/records/10708669/access/links",
        "access_users": "https://zenodo.org/api/records/10708669/access/users",
        "access_request": "https://zenodo.org/api/records/10708669/access/request",
        "access": "https://zenodo.org/api/records/10708669/access",
        "reserve_doi": "https://zenodo.org/api/records/10708669/draft/pids/doi",
        "communities": "https://zenodo.org/api/records/10708669/communities",
        "communities-suggestions": "https://zenodo.org/api/records/10708669/communities-suggestions",
        "requests": "https://zenodo.org/api/records/10708669/requests"
    },
    "updated": "2024-02-26T17:59:13.853168+00:00",
    "recid": "10708669",
    "revision": 4,
    "files": [
        {
            "id": "66aad64a-6960-44bd-9813-c7fa642665f0",
            "key": "ferc714_xbrl_taxonomy_metadata.json",
            "size": 192370,
            "checksum": "md5:9fef935f9a970839319ada082d6c9672",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc714_xbrl_taxonomy_metadata.json/content"
            }
        },
        {
            "id": "dc7dc9b7-610f-4c17-914d-8595ea676e97",
            "key": "ferc714_xbrl_datapackage.json",
            "size": 59809,
            "checksum": "md5:698167bc11fb964f472be9759ab4b7ea",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc714_xbrl_datapackage.json/content"
            }
        },
        {
            "id": "fae3beb1-809d-4ec7-b1fc-b94c77ffa9ad",
            "key": "ferc2_xbrl.sqlite.gz",
            "size": 13769037,
            "checksum": "md5:b28739737424b6beb040daffd1636f26",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc2_xbrl.sqlite.gz/content"
            }
        },
        {
            "id": "d940d97b-02da-4f4b-bcb2-5b59a1fff9e5",
            "key": "ferc714_xbrl.sqlite.gz",
            "size": 102148923,
            "checksum": "md5:6d9a7e9f62a4c931db99c48b3a59990c",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc714_xbrl.sqlite.gz/content"
            }
        },
        {
            "id": "464c4e51-8816-46b4-87d5-6f483f0e3dcf",
            "key": "ferc6_dbf.sqlite.gz",
            "size": 43933844,
            "checksum": "md5:f38244976b151849f5cc124d70ed303f",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc6_dbf.sqlite.gz/content"
            }
        },
        {
            "id": "e50f96f8-f8a4-41d8-9944-bcc0e10082b7",
            "key": "ferc2_xbrl_datapackage.json",
            "size": 1976390,
            "checksum": "md5:7b98f5e4773179265771c0149fff5186",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc2_xbrl_datapackage.json/content"
            }
        },
        {
            "id": "432ebe57-4efb-4c36-95d1-8f76d929ecb3",
            "key": "ferc6_xbrl_taxonomy_metadata.json",
            "size": 2874928,
            "checksum": "md5:2c85dae41667448ea32066726f76ae5e",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc6_xbrl_taxonomy_metadata.json/content"
            }
        },
        {
            "id": "98eb5984-e9ce-48ba-a3f2-f1d1596ccd3e",
            "key": "2024-02-26-0437-a9b4e3659-v2024.2.6-pudl-etl.log",
            "size": 5966490,
            "checksum": "md5:5aa7d02b91599dc71c3bac68ebaab961",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/2024-02-26-0437-a9b4e3659-v2024.2.6-pudl-etl.log/content"
            }
        },
        {
            "id": "ea5ea289-75e2-4b22-b647-343162adb821",
            "key": "ferc1_xbrl_taxonomy_metadata.json",
            "size": 7277778,
            "checksum": "md5:026ad62c418e5e8aab4a85e6a68d628a",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc1_xbrl_taxonomy_metadata.json/content"
            }
        },
        {
            "id": "32e9b53f-bc95-494b-8ccc-5d7133ddeea7",
            "key": "ferc2_dbf.sqlite.gz",
            "size": 74501858,
            "checksum": "md5:b2d6613b96d7491ac5a26c8368256f4b",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc2_dbf.sqlite.gz/content"
            }
        },
        {
            "id": "09f22c0c-518b-4d70-8749-e7946d9cd6ad",
            "key": "core_epacems__hourly_emissions.parquet",
            "size": 5411628414,
            "checksum": "md5:04044a97321ca020a0859a5973ff4827",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/core_epacems__hourly_emissions.parquet/content"
            }
        },
        {
            "id": "1bb2a3db-7c6e-4cd7-86b8-3afcbdc690c2",
            "key": "pudl.sqlite.gz",
            "size": 2774764088,
            "checksum": "md5:e4f52aa31f5296d24e05f4aaf5f47246",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/pudl.sqlite.gz/content"
            }
        },
        {
            "id": "0528c1d2-43fb-4a51-817f-e2ec0b6bea33",
            "key": "ferc6_xbrl_datapackage.json",
            "size": 1068696,
            "checksum": "md5:f2a77356d745ab2213c721496c82f9e3",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc6_xbrl_datapackage.json/content"
            }
        },
        {
            "id": "cc059e03-26dd-4c0e-b5ca-361c30e505f0",
            "key": "ferc60_xbrl.sqlite.gz",
            "size": 2301917,
            "checksum": "md5:412a72389b3ab2861b2e055a38aab12c",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc60_xbrl.sqlite.gz/content"
            }
        },
        {
            "id": "ab63b205-8150-4598-86f9-01c3f1485122",
            "key": "ferc1_dbf.sqlite.gz",
            "size": 275514824,
            "checksum": "md5:191824673c778ce8a5c4e051577ad556",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc1_dbf.sqlite.gz/content"
            }
        },
        {
            "id": "5596d31c-b7ab-438c-bbfe-e28054279675",
            "key": "ferc1_xbrl.sqlite.gz",
            "size": 97227925,
            "checksum": "md5:d4642e586d0a2ed855ee9481e2607523",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc1_xbrl.sqlite.gz/content"
            }
        },
        {
            "id": "02415c77-1ce4-4337-852f-3af1eca9a778",
            "key": "ferc60_dbf.sqlite.gz",
            "size": 2882516,
            "checksum": "md5:360fafcc84d8e19acf188f7baba7b5aa",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc60_dbf.sqlite.gz/content"
            }
        },
        {
            "id": "51cadedf-9c0f-4d51-822c-68211c0d8bd0",
            "key": "ferc1_xbrl_datapackage.json",
            "size": 1727987,
            "checksum": "md5:82064ef14c0d6da8b87e0360493bec94",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc1_xbrl_datapackage.json/content"
            }
        },
        {
            "id": "a96f1be1-d5d3-4321-9cea-9520baa9b7f6",
            "key": "ferc60_xbrl_taxonomy_metadata.json",
            "size": 1856822,
            "checksum": "md5:0cc4fca785314082b8f514bae888600e",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc60_xbrl_taxonomy_metadata.json/content"
            }
        },
        {
            "id": "03c86613-a4dd-4b1d-8061-ad4f44a8fc58",
            "key": "ferc6_xbrl.sqlite.gz",
            "size": 10588865,
            "checksum": "md5:5643d56b2fa8cff5e8e888fc4a483da9",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc6_xbrl.sqlite.gz/content"
            }
        },
        {
            "id": "f33f680f-4f71-4dfb-bbe3-567d64c9c3da",
            "key": "censusdp1tract.sqlite.gz",
            "size": 506670400,
            "checksum": "md5:12d5709d09e9020ad59c88d94709fb29",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/censusdp1tract.sqlite.gz/content"
            }
        },
        {
            "id": "ffc35da9-e182-44e3-9c1c-dd44ecc427d1",
            "key": "ferc2_xbrl_taxonomy_metadata.json",
            "size": 7105520,
            "checksum": "md5:fbbd750509118029d7d675e573f6ad5a",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc2_xbrl_taxonomy_metadata.json/content"
            }
        },
        {
            "id": "0691988c-0dd5-4174-8dc9-c10f8b73f886",
            "key": "ferc60_xbrl_datapackage.json",
            "size": 748860,
            "checksum": "md5:d2a6dec21e482fa4bde44b930e774975",
            "links": {
                "self": "https://zenodo.org/api/records/10708669/files/ferc60_xbrl_datapackage.json/content"
            }
        }
    ],
    "owners": [
        {
            "id": 90379
        }
    ],
    "status": "published",
    "stats": {
        "downloads": 2515,
        "unique_downloads": 1904,
        "views": 7966,
        "unique_views": 6996,
        "version_downloads": 90,
        "version_unique_downloads": 89,
        "version_unique_views": 102,
        "version_views": 112
    },
    "state": "done",
    "submitted": true
}

Verify that everything is fixed!

Once you've applied any necessary fixes, make sure that the nightly build outputs are all in their right places.

- [x] [S3 distribution bucket](https://s3.console.aws.amazon.com/s3/buckets/pudl.catalyst.coop?region=us-west-2&bucketType=general&prefix=nightly/&showversions=false) was updated at the expected time
- [x] [GCP distribution bucket](https://console.cloud.google.com/storage/browser/pudl.catalyst.coop/nightly;tab=objects?project=catalyst-cooperative-pudl) was updated at the expected time
- [x] [GCP internal bucket](https://console.cloud.google.com/storage/browser/builds.catalyst.coop) was updated at the expected time
- [x] [Datasette PUDL version](https://data.catalyst.coop/pudl/core_pudl__codes_datasources) points at the same hash as [nightly](https://github.com/catalyst-cooperative/pudl/tree/nightly)

Relevant logs

gsutil cp gs://builds.catalyst.coop/2024-04-11-0602-d37d630ee-main/2024-04-11-0602-d37d630ee-main-pudl-etl.log .
Creating a new PUDL data release on Zenodo.
2024-04-11 14:52:16,848: INFO - Using Zenodo token: XXXXXXXXXX (zenodo_data_release.py:88)
2024-04-11 14:52:16,848: INFO - Getting new version for 5563 (zenodo_data_release.py:240)
Traceback (most recent call last):
  File "/home/mambauser/pudl/devtools/zenodo/zenodo_data_release.py", line 408, in <module>
    pudl_zenodo_data_release()
  File "/home/mambauser/env/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/devtools/zenodo/zenodo_data_release.py", line 395, in pudl_zenodo_data_release
    .get_empty_draft()
     ^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/devtools/zenodo/zenodo_data_release.py", line 241, in get_empty_draft
    latest_record = self.zenodo_client.get_record(self.record_id)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/pudl/devtools/zenodo/zenodo_data_release.py", line 135, in get_record
    return _NewRecord(**response.json())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mambauser/env/lib/python3.12/site-packages/pydantic/main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for _NewRecord
id
  Field required [type=missing, input_value={'error_id': '2a8f64a27fa...cation.', 'status': 500}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
files
  Field required [type=missing, input_value={'error_id': '2a8f64a27fa...cation.', 'status': 500}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing

I get the same behavior very quickly when running the data release script locally:

./zenodo_data_release.py --env sandbox --publish --source-dir gs://pudl.catalyst.coop/nightly/

2024-04-14 10:31:35,667: INFO - Using Zenodo token: XXXXXXXXXXX (zenodo_data_release.py:88)
2024-04-14 10:31:35,667: INFO - Getting new version for 5563 (zenodo_data_release.py:240)
Traceback (most recent call last):
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 408, in <module>
    pudl_zenodo_data_release()
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 395, in pudl_zenodo_data_release
    .get_empty_draft()
     ^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 241, in get_empty_draft
    latest_record = self.zenodo_client.get_record(self.record_id)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/code/catalyst/pudl/devtools/zenodo/./zenodo_data_release.py", line 135, in get_record
    return _NewRecord(**response.json())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/pydantic/main.py", line 175, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for _NewRecord
id
  Field required [type=missing, input_value={'error_id': '45cff1fda13...cation.', 'status': 500}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing
files
  Field required [type=missing, input_value={'error_id': '45cff1fda13...cation.', 'status': 500}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing
[1]    80374 exit 1     ./zenodo_data_release.py --env sandbox --publish --source-dir
jdangerx commented 2 months ago

In our meeting today we agreed to first stop "successfully publishes to zenodo sandbox" from blocking "we mark a nightly build as successful." This should still let all the data get published, we just won't be able to test the Zenodo interaction until we try to do a manual data release. That's frustrating and fragile but not the end of the world.

Then we will, at some point, think about how to make our data publication process more robust and less dependent on the vagaries of Zenodo's sandbox environment.

zaneselvans commented 2 months ago

@jdangerx who did we deputize to ping the Zenodo folks about this new and exciting behavior?

jdangerx commented 2 months ago

Nobody yet, are you game to explain the specifics of the failure to them?

If not, I'm happy to dig through the logs to provide a good bug report too.

zaneselvans commented 2 months ago

All that's required to reproduce the failure is:

import requests
import os
import json

token = os.environ["ZENODO_SANDBOX_TOKEN_PUBLISH"]
dude = requests.request(
    method="GET",
    url="https://sandbox.zenodo.org/api/records/5563",
    headers={"Authorization": f"Bearer {token}"},
    timeout=5,
)
print(json.dumps(dude.json(), indent=4))

But trying it again now... it works! I guess they heard I was going to send an email and fixed it.

I'll close this issue in the morning if the nightly builds pass.

zaneselvans commented 2 months ago

Zenodo appears to have fixed this problem without intervention from us, as the nightly builds succeeded tonight, so I'll close the issue.