Closed grgmiller closed 4 years ago
Hi Greg, this input file looks like it's from the 0.2.0 version of pudl... are you sure you're using 0.3.0? The only reason I know is that the first two elements will need to be datapkg...
rather than pkg...
I've seen this box
thing or something like it come up before, but it's only ever been a warning in the past, not something that actually crashed the process. I don't suppose you've upgraded to pandas 1.0 have you? They deprecated a bunch of things and we haven't changed our process to work with it yet.
This is a change in pandas -- the box
argument was deprecated before, and is removed in pandas v1.0.
Quoting the release notes:
Removed the previously deprecated keyword “box” from to_datetime() and to_timedelta(); in addition these now always returns DatetimeIndex, TimedeltaIndex, Index, Series, or DataFrame (GH24486)
Arrrgh, dangit, I commented out the version pinning in setup.py
to allow pandas 1.0 as a test -- just to see how it would break -- and forgot to uncomment it before the 0.3.0 release. Maybe I should do a 0.3.1 to fix that.
Oh wait a minute, no I did not -- I was thinking of accidentally allowing it to install on Python 3.8 in the setup.py
So... it shouldn't have allowed you to try and use pandas 1.0 alongside PUDL 0.3.0.
Ah thanks for the thoughts here. When I check my pudl environment in anaconda navigator, it looks like pandas 1.0.0 is installed, but pandas is not an updatable package, so it looks like I cannot roll it back to 0.25.3 ... not sure why it allowed me to update to 1.0.0 - I don't even have 1.0 on my base environment.
It looks like setup.py does include 'pandas>=0.25,<1.0',
Do you think I can fix this by just updating pudl using?
conda update conda
conda env update pudl
Or will I need to uninstall and reinstall pudl completely?
On a related note - will the datapackages that I created with the previous version of pudl be the same as the datapackages created with 0.3.0, or would you recommend re-ETLing each datapackage using 0.3.0?
Actually, digging into this deeper, I also want to confirm that I updated pudl to 0.3.0 correctly. I had 0.2.0 installed, and to update to 0.3.0 I just opened anaconda prompt and ran:
conda update conda
conda env update pudl
Is this all I had to do, or did I miss a critical step here? It looks like my environment.yml file might not have been updated by this command. Currently, it contains:
name: pudl
channels:
- conda-forge
- defaults
dependencies:
- catalystcoop.pudl
- dask
- jupyter
- jupyterlab
- pip
- python>=3.7
I have a similar question to what Greg just posted. What's the best way to update things completely with the new update.
I think that what @grgmiller did should work, but to be totally sure I would wipe the old conda environment, and re-create it like...
conda env remove --name pudl
conda env create --name pudl --file environment.yml
or something like that. You could also explicitly set catalystcoop.pudl=0.3.0
if you wanted to inside the environment file. To check and see what version of everything you have installed within the environment you can do conda list
with the environment activated, and it'll show you all the packages installed there and their versions.
The environment.yml
file won't get updated (unless you go in and change it) -- it says which packages to install, and may or may not specify their versions. Though when you run conda env update pudl
it should try to upgrade the packages in there to the most recent compatible versions.
Thank you @zaneselvans. I ran conda env remove --name pudl
, added catalystcoop.pudl=0.3.0
to my environment.yml file and then imported that yml file as a new environment in anaconda navigator. The new pudl environment now has pandas set to 0.25.3. I'll try re-running my ETL and see what happens.
One note is that following this process did not actually update any of the files in my pudl workspace. So for me, the etl_example.yml was not updated to the newer version with datapkg... instead of pkg..., and none of the example notebooks in my notebooks folder were updated. How would we actually go about updating the files in our workspace?
Yes, if you want it to overwrite your existing files, you'll need to use the --clobber
flag -- and it'll wipe them all out, which you might not want to do if you've been editing them.. But you can also run pudl_setup
in another directory and it should create new copies of the settings files, notebooks, etc. there.
This issue was resolved when I reinstalled PUDL v.0.3.0 and made sure that my pudl environment was using pandas v0.25.3 instead of v1.0.0
Describe the bug
When running pudl_etl on 2015 data, I get the following error:
It appears that box is a to_datetime argument, not a to_timedelta argument.
Bug Severity
How badly is this bug affecting you?
To Reproduce
Curiously, the ETL works for the first 11 months of AL-2015 data, but this error popped up when working on AL-2015-12
Software Environment?