catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Update/unify getting started docs #262

Closed gschivley closed 5 years ago

gschivley commented 5 years ago

I installed PUDL and built a postgres database from scratch on a Mac. Along the way I encountered a few issues (see below). The mac/linux/windows docs aren't always consistent, which should also be fixed.

PostgreSQL

After installing PostgreSQL, open the application.

They should probably also say to click initialize

Other issues

zaneselvans commented 5 years ago

Would you be willing to just go ahead and update the docs and make a PR? I'm on Linux, and we start over from scratch rarely... so this kind of blank slate revision is very useful.

gschivley commented 5 years ago

Sure. What is the expected way to define the data to be downloaded vs initialized in the database? Is it arguments and settings.py? Having all of CEMS downloaded by default seems like it might catch users by surprise. I modified settings.py before runing init_pudl.py and assumed that it would be used for data download.

karldw commented 5 years ago

@gschivley, my understanding is that postgres.app defaults to trusting any connection, so there's no need for a .pgpass file

zaneselvans commented 5 years ago

Right now the datastore and the DB initialization are managed completely separately, though I could see update_datastore.py reading the same settings.py file to figure out what it ought to download. But at the moment it's command line arguments for update_datastore.py which determine what gets downloaded, and the years listed for each data source in settings.py that determines what gets pulled into the DB.

gschivley commented 5 years ago

@gschivley, my understanding is that postgres.app defaults to trusting any connection, so there's no need for a .pgpass file

Ok, interesting. I couldn't add/drop tables as the catalyst user (with the reset_db.sh file). Had to start psql and do it as the default user (Greg). Is that related?

zaneselvans commented 5 years ago

@gschivley Maybe it depends which user created / owned the database? When I was on a Mac I also recall never having to deal with any permissions stuff.

gschivley commented 5 years ago

@gschivley Maybe it depends which user created / owned the database? When I was on a Mac I also recall never having to deal with any permissions stuff.

No idea. Maybe I'll add a quick note about it in the getting started.

zaneselvans commented 5 years ago

@gschivley Did you feel like the updated getting started document was satisfactory?

gschivley commented 5 years ago

Much better. A note about updating multiple data sources with different calls to update_datastore.py might be nice. Sounds simple but seeing the CI code pulling in one source at a time was helpful.

python update_datastore.py -s eia923 -y 2017
python update_datastore.py -s eia860 -y 2017
python update_datastore.py -s epaipm