catalyst-cooperative / pudl-catalog

An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.
https://catalyst.coop/pudl/
MIT License
9 stars 2 forks source link

Skip gcs integration tests to avoid egress fees #81

Closed bendnorman closed 1 year ago

bendnorman commented 1 year ago

We got hit with some big egress fees for the gs://intake.catalyst.coop bucket in the last two days. We are pretty sure this is because the CI runs the integration tests 8 times for each push to a PR. 0.12 $/GB 8 runs 4 downloads of the epacems data * 5 GB (size of EPA cems data) = ~$20 per push!! That's no good.

This PR skips the GCS tests until we figure out how to download partitions of the data using intake. We didn't disable the s3 tests because AWS covers the egress fees and we'd still like to test our catalogs! We need to figure out how to pull partitions so our CI isn't downloading 160 GB of CEMS data every time there is a push.

I'm still not sure why we weren’t hit with big egress fees earlier in the catalog development.

codecov[bot] commented 1 year ago

Codecov Report

Base: 100.0% // Head: 100.0% // No change to project coverage :thumbsup:

Coverage data is based on head (338323b) compared to base (e4ad1ad). Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #81 +/- ## ======================================= Coverage 100.0% 100.0% ======================================= Files 2 2 Lines 44 44 ======================================= Hits 44 44 ``` Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalyst-cooperative)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

bendnorman commented 1 year ago

I reran one of the 3.11 tox actions and the bucket still sent 1 - 4 GBs. I went over our tests a few times and I don't see where we are requesting the GCS bucket!?! The default PUDL_INTAKE_PATH is s3 and it's skipping the GCS tests...

bendnorman commented 1 year ago

The CI is no longer making requests to the GCS bucket so we won't be getting hit with large egress fees whenever we want to run the CI.

I'm still not sure the CI wasn't pulling dozens of GBs of data in the past but these are my ideas:

zaneselvans commented 1 year ago

I looked at the tox-pytest workflow and it wasn't doing github runner caching, so that isn't it. So weird.