catalyst-cooperative / pudl-catalog

An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.
https://catalyst.coop/pudl/
MIT License
9 stars 2 forks source link

Improve integration tests. Add SQLite & requester pays docs #32

Closed zaneselvans closed 2 years ago

zaneselvans commented 2 years ago

Several changes in this PR:

Closes #25 Closes #29 Closes #30

codecov[bot] commented 2 years ago

Codecov Report

Merging #32 (8a9e3b1) into main (ddf73a3) will not change coverage. The diff coverage is 100.0%.

@@           Coverage Diff           @@
##             main      #32   +/-   ##
=======================================
  Coverage   100.0%   100.0%           
=======================================
  Files           2        2           
  Lines          42       44    +2     
=======================================
+ Hits           42       44    +2     
Impacted Files Coverage Δ
src/pudl_catalog/__init__.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fbe9ab9...8a9e3b1. Read the comment docs.

katie-lamb commented 2 years ago

I might have a unique situation because of my previous use of GCP but this is the problem I ran into.

First I tried to follow the docs in order and clicked on Create Project:

I then tried to create a new billing account that isn't associated with DBCP:

Finally I tried going to the IAM admin page:

This seems like a classic GCP issue. I think to make a project I need to be given permissions to create billing accounts in the Catalyst org.

From a docs perspective, what's maybe the biggest discrepancy is that the Create Project page only requires you to enter a project name and for me I need to add billing account info. Not sure why this is. It seems hard to make docs to account for every scenario but we should probably try to sort this out a little. Maybe the section for creating a billing account should come before the project creation? Again, this might be weird for me because I have interacted with GCP before and I guess someone with a little experience with GCP probably has more know how on how to create projects and the Catalyst instructions are less essential. Maybe someone with a personal email that's never interacted with GCP before should try this and see what happens.

zaneselvans commented 2 years ago

Hey @katie-lamb would you be willing to try working through the instructions with your own gmail account rather than Catalyst? Since that's the way it'll probably work for an outside person?

katie-lamb commented 2 years ago

Okay so I've now tried to do this from another non-Catalyst email. It's my alumni email that has never interacted with GCP before. It shouldn't matter that it's not a gmail account I think. Here are some discrepancies from the docs:

I can also try from my personal gmail - which was interacted with GCP before. Or I can create a fresh gmail account just for this but I don't think it should matter if it's a gmail account or other org.

zaneselvans commented 2 years ago

And your Stanford Alumni account allowed you to create projects & billing accounts? That seems a little bit surprising to me.

I was not sure what order the gcloud init and gcloud auth login stuff was supposed to happen in.

At the end it's complaining because it doesn't know where to bill the access costs to. So either it doesn't know what project to bill to, or the project it's trying to bill to has no billing account. Does it work if you explicitly pass it the billing project like this:

gsutil -u your_new_project_name ls gs://intake.catalyst.coop

If it does, then the project has billing associated with it, but that project isn't being used by default. If it doesn't work, then that project doesn't have billing (or you don't have permission to spend).

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

zaneselvans commented 2 years ago

Hey yes, only the database / parquet file that's being accessed is downloaded, one by one, so it won't grab the ferc1 DB until you touch it, and it won't grab the pudl DB until you touch it, so I think that's the expected behavior.