Closed briannacote closed 4 years ago
Hey there, yes a couple of the names of the attributes in the settings file have changed. If you replace your top level pkg_*
items with datapkg_*
it should work. You can also see the example file that is distributed with the new release, which should be using the correct settings.
Ahh, cool. I just did a pip install catalystcoop.pudl --upgrade.
Are there other suggested steps to getting the latest example files as well? I just want to make sure I update things appropriately for future changes. For example, this was the next thing I ran into around the ferc settings which might also be in the updated example file:
That table has been removed for the moment since with the inclusion of the older years, the row numbers changed, making the multi-year DB invalid -- we have a way to fix it, but need to map all those old rows before it'll be correct. You can remove it from the list of tables that's loaded into the PUDL DB for FERC Form 1 -- run pudl_setup
and check out the new distributed example settings file as a basis for the current ETL process -- I'm not sure if there might be other things that have changed in there... not thinking of any off the top of my head though.
I think Greg just asked the same question. What's the best way to update to the latest version and make sure the latest is being used in all ways? That's what I'm also curious about.
I'll try out the FERC Form 1. But I'm not sure how to check out the latest example settings file. I'm not sure if you mean through GitHub?
Oh sorry, no I didn't mean that kind of "check out" -- I just meant take a look at.
There's a script included with the package called pudl_setup
If you run it, it deploys a few files that are part of the package, including an example settings file. That's what I was suggesting you look at.
(though you can certainly also look at the one that's part of the repo, it's in src/pudl/package_data/settings/
)
Hi Zane,
Thanks for the link to the settings files. That helped me get through the FERC issue I shared above. I tried this again and now ran into this issue:
Is there any information I can provide to help troubleshoot?
We usually see this error when EIA changes the headers / tabs or other formatting in the spreadsheets.
You might want to use the pudl_data
script to pull down a fresh copy of the EIA 923 data, at least for 2018 (there was a late addition of some tabs to it even after the supposedly "final" release). That would look something like:
pudl_data --clobber -s eia923 -y 2018
Also, when you use screenshots to post information here, it's not searchable, so anybody else on the internet who is attempting to figure out the same problem won't come across this post. You can cut-and-paste text here and have it show up exactly as it does in the terminal by enclosing it in triple backticks. If it's a language that Github knows about, you can also tell it how the code should be highlighted. Some examples here..
Sure thing. Here's that last one written out.
Traceback (most recent call last):
File "/Users/briannacote/anaconda3/envs/pudl2/bin/pudl_etl", line 10, in <module>
sys.exit(main())
File "/Users/briannacote/anaconda3/envs/pudl2/lib/python3.7/site-packages/pudl/cli.py", line 110, in main
clobber=args.clobber)
File "/Users/briannacote/anaconda3/envs/pudl2/lib/python3.7/site-packages/pudl/etl.py", line 866, in generate_datapkg_bundle
datapkg_resources = etl(datapkg_settings, output_dir, pudl_settings)
File "/Users/briannacote/anaconda3/envs/pudl2/lib/python3.7/site-packages/pudl/etl.py", line 804, in etl
dataset_dict[dataset], output_dir, pudl_settings)
File "/Users/briannacote/anaconda3/envs/pudl2/lib/python3.7/site-packages/pudl/etl.py", line 205, in _etl_eia
eia923_years=eia923_years, data_dir=pudl_settings["data_dir"])
File "/Users/briannacote/anaconda3/envs/pudl2/lib/python3.7/site-packages/pudl/extract/eia923.py", line 241, in extract
years=eia923_years)
File "/Users/briannacote/anaconda3/envs/pudl2/lib/python3.7/site-packages/pudl/extract/eia923.py", line 161, in get_eia923_page
to_drop = [c for c in newdata.columns if c[:8] == 'reserved']
File "/Users/briannacote/anaconda3/envs/pudl2/lib/python3.7/site-packages/pudl/extract/eia923.py", line 161, in <listcomp>
to_drop = [c for c in newdata.columns if c[:8] == 'reserved']
TypeError: 'float' object is not subscriptable
Did pulling a new version of the EIA 923 for 2018 fix this?
Almost! It finished the ETL process. It looks like it created a new directory called datapkg. Before this directory was datapackage. I have python scripts that hook up to the sqlite setup. I have the following:
pudl_settings = pudl.workspace.setup.get_defaults()
print("Attaching to database:",pudl_settings["pudl_db"])
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])
It looks like that still directs to the old datapackage directory because when I update my scripts to pull data for 2018, it doesn't happen. Is there a way to update this side as well?
It's super close.
Hmm. That suggests to me that there's another older version of PUDL floating around somewhere that's being used to generate the paths. How are you controlling the installed versions in your other scriptyland?
(v0.3.1 is the most recent release.)
Hang tight,...it might have just been me being silly. I created the datapackage, but not the sqlite database. I'm running that now.
Yeaaaaaah, I know it's all chopped up into different steps. The data release includes some scripts that just do all the parts. We want to integrate it into a single thing.
Ok, so that was definitely that last error I ran into. I just needed to follow directions 😄
Here's the next one. Did some table names change? I know you guys provided at some point a quick way to go through an sqlite file with either a Jupyter notebook or something like that, but I can't seem to find it. Am I remembering correctly?
Traceback (most recent call last):
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
cursor, statement, parameters, context
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: no such column: p.plant_name
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./pull_pudl_data_sqlite.py", line 97, in <module>
df_860 = pd.read_sql(statement_860, pudl_engine)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/pandas/io/sql.py", line 436, in read_sql
chunksize=chunksize,
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/pandas/io/sql.py", line 1218, in read_query
result = self.execute(*args)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/pandas/io/sql.py", line 1087, in execute
return self.connectable.execute(*args, **kwargs)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2169, in execute
return connection.execute(statement, *multiparams, **params)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 982, in execute
return self._execute_text(object_, multiparams, params)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1155, in _execute_text
parameters,
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
e, statement, parameters, cursor, context
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 152, in reraise
raise value.with_traceback(tb)
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
cursor, statement, parameters, context
File "/Users/briannacote/anaconda3/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: p.plant_name
[SQL: SELECT
g.plant_id_eia,
g.generator_id,
g.technology_description,
p.plant_name,
p.county,
p.state,
p.latitude,
p.longitude,
p.balancing_authority_code,
g.capacity_mw,
g.nameplate_power_factor,
g.summer_capacity_mw,
g.winter_capacity_mw,
g.minimum_load_mw,
pm.operating_date,
g.retirement_date,
g.time_cold_shutdown_full_load_code,
pm.prime_mover_code,
g.energy_source_code_1
from generators_eia860 g
inner join plants_entity_eia p on g.plant_id_eia =p.plant_id_eia
join generators_entity_eia pm on (g.plant_id_eia = pm.plant_id_eia and g.generator_id =pm.generator_id)
where
g.report_date = '2017-01-01'
and (p.state is null or p.state not in ('AK', 'HI'))
and g.operational_status_code in ('existing','OP','OS','SB','OA')
order by g.plant_id_eia, g.capacity_mw;
]
(Background on this error at: http://sqlalche.me/e/e3q8)
Yes, the column names and coding are definitely still evolving. One thing that changed is because we're preserving plant names from both FERC and EIA and their different we wanted to make it explicit which one was being referred to -- so instead of plant_name
now you'll get plant_name_eia
or plant_name_ferc1
depending where you're selecting. Not sure if the operational_status_code
values changed, but possibly.
There's the pudl_intro.ipynb
notebook which pulls some tables as an example and might be helpful. It should get deployed into the notebooks
directory in your PUDL data management workspace. Is that what you're thinking of?
We also need to make an easy to read version of all the table metadata and integrate it into the documentation on Read The Docs.
That is exactly the notebook, thank you! It had been a while since I used it and couldn't remember where it was. I got the new column name set. I'll check out the operational_status_code as I have some work by hand I did with the 2018 data that I can back check with.
2018 is a go! Thank you for the help, Zane!
Describe the bug
This may not be a bug but a user error. It looks like from a recent update to catalystcoop.pudl I can now perform the normal ETL process for 2018 for EIA 860 and EIA 923?
I did the following within the environment I have setup for pudl:
I think #396 is what I was looking for help on.
Perhaps there is something else I also need to update?
Thank you! Brianna