Replace pudl_data script with datastore

ptvirgo commented 4 years ago

There are several places in the documentation and testing that expect a pudl_data script rather than the new datastore. Get that stuff up to date.

ptvirgo commented 4 years ago

references to pudl_data have been removed from the code & documentation
the documentation correctly describes how to perform etl with the current codebase
datastore command line form is added to pudl setup.py as pudl_datastore. (the pudl prefix keeps the namespace tab-complete ready)

ptvirgo commented 3 years ago

Todo:

[ ] Run the etl process via README.rst
[x] Datastore script --help should list datastore key names.
[ ] Run the etl process as outlined in docs/usage.rst

ptvirgo commented 3 years ago

Bug ...

From README.rst

$ ferc1_to_sqlite pudl-work/settings/ferc1_to_sqlite_example.yml produces -

` =================================== FAILURES =================================== __ test_mcoe ___

cat_id = 41696

def grab_fuel_state_monthly(cat_id):
    """
    Grab an API response for monthly fuel costs for one fuel category.

    The data we want from EIA is in monthly, state-level series for each fuel
    type. For each fuel category, there are at least 51 embeded child series.
    This function compiles one fuel type's child categories into one request.
    The resulting api response should contain a list of series responses from
    each state which we can convert into a pandas.DataFrame using
    convert_cost_json_to_df.

    Args:
        cat_id (int): category id for one fuel type. Known to be
    """
    # we are going to compile a string of series ids to put into one request
    series_all = ""
    fuel_level_cat = get_response(make_url_cat_eiaapi(cat_id))
    try:

      for child in fuel_level_cat.json()['category']['childseries']:

E KeyError: 'category'

.env_pudl/lib/python3.8/site-packages/pudl/output/eia923.py:614: KeyError

During handling of the above exception, another exception occurred:

fast_out = <pudl.output.pudltabl.PudlTabl object at 0x7f8a58852d30>

def test_mcoe(fast_out):
    """Calculate MCOE."""
    logger.info("Calculating MCOE.")

  mcoe_df = fast_out.mcoe()

test/fast_output_test.py:62:

.env_pudl/lib/python3.8/site-packages/pudl/output/pudltabl.py:613: in mcoe self._dfs['mcoe'] = pudl.analysis.mcoe.mcoe( .env_pudl/lib/python3.8/site-packages/pudl/analysis/mcoe.py:351: in mcoe drop_cols = [x for x in pudl_out.gens_eia860().columns .env_pudl/lib/python3.8/site-packages/pudl/analysis/mcoe.py:352: in if x in pudl_out.fuel_cost().columns and x not in merge_cols] .env_pudl/lib/python3.8/site-packages/pudl/output/pudltabl.py:556: in fuel_cost self._dfs['fuel_cost'] = pudl.analysis.mcoe.fuel_cost(self) .env_pudl/lib/python3.8/site-packages/pudl/analysis/mcoe.py:174: in fuel_cost pudl_out.frc_eia923()[['plant_id_eia', .env_pudl/lib/python3.8/site-packages/pudl/output/pudltabl.py:299: in frc_eia923 pudl.output.eia923.fuel_receipts_costs_eia923( .env_pudl/lib/python3.8/site-packages/pudl/output/eia923.py:238: in fuel_receipts_costs_eia923 fuel_costs_avg_eiaapi = get_fuel_cost_avg_eiaapi( .env_pudl/lib/python3.8/site-packages/pudl/output/eia923.py:686: in get_fuel_cost_avg_eiaapi grab_fuel_state_monthly(fuel_cat_id)))

cat_id = 41696

def grab_fuel_state_monthly(cat_id):
    """
    Grab an API response for monthly fuel costs for one fuel category.

    The data we want from EIA is in monthly, state-level series for each fuel
    type. For each fuel category, there are at least 51 embeded child series.
    This function compiles one fuel type's child categories into one request.
    The resulting api response should contain a list of series responses from
    each state which we can convert into a pandas.DataFrame using
    convert_cost_json_to_df.

    Args:
        cat_id (int): category id for one fuel type. Known to be
    """
    # we are going to compile a string of series ids to put into one request
    series_all = ""
    fuel_level_cat = get_response(make_url_cat_eiaapi(cat_id))
    try:
        for child in fuel_level_cat.json()['category']['childseries']:
            # get only the monthly... the f in the childseries seems to refer
            # the recporting to frequency
            if child['f'] == 'M':
                logger.debug(f"    {child['series_id']}")
                series_all = series_all + ";" + str(child['series_id'])

    except KeyError:

      raise AssertionError(
f"Error in Response: {fuel_level_cat.json()['data']['error']}") E AssertionError: Error in Response: invalid or missing api_key. For key registration, documentation, and examples see https://www.eia.gov/developer/

.env_pudl/lib/python3.8/site-packages/pudl/output/eia923.py:622: AssertionError `

I think the default settings file is not in line with the required key.

ptvirgo commented 3 years ago

$ pudl_etl /vagrant/pudl_work/settings/etl_example.yml 
2020-07-06 21:09:51 [    INFO] pudl:84 verifying that the data we need exists in the data store
Traceback (most recent call last):
  File "/home/vagrant/miniconda3/envs/pudl-dev/bin/pudl_etl", line 33, in <module>
    sys.exit(load_entry_point('catalystcoop.pudl', 'console_scripts', 'pudl_etl')())
  File "/home/vagrant/git/pudl/src/pudl/cli.py", line 87, in main
    pudl.helpers.verify_input_files(ferc1_years=flattened_params_dict['ferc1_years'],
AttributeError: module 'pudl.helpers' has no attribute 'verify_input_files'

zaneselvans commented 3 years ago

Hmm, do you have a valid EIA API key stored in the API_KEY_EIA environment variable? We should add a check with some useful error message in there.

catalyst-cooperative / pudl

Replace pudl_data script with datastore #658