Open denimalpaca opened 1 month ago
Using miniforge to install mamba did not result in a conda environment with python=3.12, which is needed for certain PUDL packages
Here are you referring to the base
environment that's created when you install conda
or mamba
or to the pudl-dev
environment? It's fine if the base
environment isn't python 3.12 (Currently mine is 3.10.13). mamba
should manage the python version within other environments it creates, and ought to have installed 3.12 in pudl-dev
. Can you share what command(s) you used to create the pudl-dev
environment? From the errors you're noting it sounds like maybe you didn't use make install-pudl
ferc_to_sqlite_fast
DAG?ferc1_xbrl.sqlite
in your pudl_output
directory? If so, do you know how it was created?Do you get the same test failures if you run make pytest-integration
I'm also getting the error about existing databases and clobber being False when I try to run the ferc_to_sqlite
DAG from within the Dagster UI.
@bendnorman @jdangerx It seems odd and new that it would not be possible to run the FERC to SQLite DAGs if the databases already exist. The last time I ran those was mid-March. Has something changed since then? Is there a way to set clobber=True
from within the Dagster UI?
make install-pudl
I definitely ran make install-pudl
, I checked my shell logs and it was there. I went through the doc line by line. I realized my mamba init
only added the appropriate conda initialization to my .bash_profile
, and not to zsh. So just fixed up my zshrc and got the pudl-dev
env actually working and it's the correct python version. Could be helpful to add a few lines to the doc of what the shell should look like after / how to fix this for zsh. As someone who hadn't used conda at all before I got pretty lost.
I ran the ferc_to_sqlite_fast
DAG via the dagster UI at http://127.0.0.1:3000/
. I just found the DAG and did a "run now". I did this just after creating the input and output directories, so there wasn't anything in them. I can try deleting everything in the output directory and re-running.
EDIT: Got this task to complete ok. I ran it twice before I posted this, the first error was during the general DAG run and there was an ssl timeout. When I re-ran it, the sqlite file must have been created, because then I got that error where clobber wasn't set to true. When I deleted the ferc1_xbrl.sqlite
file in my outputs directory and re-run, it was successful.
Will try the make pytest-integration
command now.
EDIT: This command produced 85 passes and 6 xfailed
, not sure what xfailed
is.
If you ran make install-pudl
then I'm confused as to why you wouldn't have gotten a good Python 3.12 environment out of it. Maybe it's related to the shell init / conda setup issue? The provided shell commands for appending the conda stuff to your shell initialization files are too cryptic. We should explain that more.
We talked about the clobber
thing a little internally this morning, and I think the simplest solution is to just have it always clobber. It only takes ~10min to regenerate all of the FERC DBs locally and we don't tend to run it very often, and the other solutions (manually deleting the files or futzing with the run configuration through the Dagster UI) both seem brittle / flaky.
After doing make install-pudl
, the command output told me to run mamba activate pudl-dev
. I was having trouble with that latter command because of the shell setup, so I was only on the base env. Once I was able to activate the pudl-env
, I did get the correct environment.
Ahh, okay okay. So the shell setup stuff really was the disconnect.
Describe the issues
Here's a list of issues I had setting up the development environment from this guide, and why I think I had them:
In Running the ETL Pipeline doc:
When I ran the
ferc_to_sqlite_fast
DAG, I got the following task failure/error:This was the only error in that run, and re-running the task didn't fix it.
Running the integration tests in the PUDL repo locally resulted in:
============================================== 23 passed, 4 skipped, 7 xfailed, 1 xpassed, 71 warnings, 62 errors in 2984.83s (0:49:44) ===============================================
Using the command:
pytest test/integration/
A short snippet of the errors:
Expected these all to pass (I think? I don't actually know if they were supposed to). Seems like it might just be an issue with how I materialized the data? Not really sure. I only ran the fast ETL, so maybe I need the full one?
Expected behavior
A clear and concise description of what you expected to happen, or what you expected the data to look like.
Software Environment?
git clone
what branch are you using: forked from main