NREL / celavi

Codebase for the Circular Economy Lifecycle Assessment and VIsualization (CELAVI) modeling framework.
https://nrel.github.io/celavi/
GNU General Public License v3.0
9 stars 7 forks source link

Bug: KeyError during routes preprocessing of national wind blade datasets #198

Closed rjhanes closed 4 months ago

rjhanes commented 4 months ago

Error output:

10100
10120
AR
finding routes
0
Traceback (most recent call last):
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\site-packages\pandas\core\indexes\range.py", line 345, in get_loc
    return self._range.index(new_key)
ValueError: 122978 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\rhanes\GitHub\celavi\celavi\__main__.py", line 25, in <module>
    Scenario(parser=PARSER)
  File "C:\Users\rhanes\GitHub\celavi\celavi\scenario.py", line 98, in __init__
    self.preprocess()
  File "C:\Users\rhanes\GitHub\celavi\celavi\scenario.py", line 217, in preprocess
    Router.get_all_routes(
  File "C:\Users\rhanes\GitHub\celavi\celavi\routing.py", line 310, in get_all_routes
    _vkmt_by_county = router.get_route(
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\site-packages\joblib\memory.py", line 655, in __call__
    return self._cached_call(args, kwargs)[0]
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\site-packages\joblib\memory.py", line 598, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\site-packages\joblib\memory.py", line 856, in call
    output = self.func(*args, **kwargs)
  File "C:\Users\rhanes\GitHub\celavi\celavi\routing.py", line 101, in get_route
    to_node = self.node_map.loc[_end_point_idx, "node_id"]
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\site-packages\pandas\core\indexing.py", line 1096, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\site-packages\pandas\core\frame.py", line 3877, in _get_value
    row = self.index.get_loc(index)
  File "C:\Users\rhanes\AppData\Local\anaconda3\envs\celavisandbox3\lib\site-packages\pandas\core\indexes\range.py", line 347, in get_loc
    raise KeyError(key) from err
KeyError: 122978

Specifically it looks like the problem cropped up in router.get_route, line 310, when it's used to get VKMT by county for all routes between facilities.

This comes from running master on the national dataset files packaged with release v1.3.1 (and v1.3.2 - identical datasets).

There is a node with ID 122978 at long = -82.505 lat = 35.5868 (see node_locations.csv row 122376). However this node does not show up in the locations_computed file produced from the run that threw the error - unclear if it wasn't processed yet or if the fact it's missing is the problem.

Plan for debugging:

Set up issue-198 branch to investigate and implement fix Re run with a set_trace in get_route to catch the KeyError before it stops the code, then investigate further.

rjhanes commented 4 months ago

Investigation:

Reproduce error by running on issue-198 (identical to master) and the national datasets, with the following debugging code:

image

Result:

10100
10120
AR
finding routes
0
> c:\users\rhanes\github\celavi\celavi\routing.py(325)get_all_routes()
-> _vkmt_by_county["source_long"] = _routes.source_long.iloc[i]
(Pdb) i
0
(Pdb) _routes
      source_long  source_lat  destination_long  destination_lat
1144   -92.867156   34.608767               NaN              NaN
2289   -94.416600   33.693600               NaN              NaN
3035          NaN         NaN        -92.149700        34.837100
3040          NaN         NaN        -94.370400        35.292600
3052          NaN         NaN        -92.302410        34.651290
...           ...         ...               ...              ...
1141   -92.867156   34.608767        -86.475468        40.519850
1142   -92.867156   34.608767        -68.994546        45.032014
1143   -92.867156   34.608767        -84.343820        36.123894
3535          NaN         NaN       -100.419370        32.451670
3536          NaN         NaN        -93.041458        41.715004

[1170 rows x 4 columns]
(Pdb) _routes.source_long.iloc[i]
-92.867156
(Pdb) _routes.source_lat.iloc[i]
34.6087675
(Pdb) _routes.destination_long.iloc[i]
nan
(Pdb) start=(
*** SyntaxError: unexpected EOF while parsing
(Pdb)                                     _routes.source_long.iloc[i],
(-92.867156,)
(Pdb)                                     _routes.source_lat.iloc[i],
(34.6087675,)
(Pdb)                                 )
*** SyntaxError: unmatched ')'
(Pdb) start=(
*** SyntaxError: unexpected EOF while parsing
(Pdb)                                     _routes.source_long.iloc[i],
(-92.867156,)
(Pdb)                                     _routes.source_lat.iloc[i],
(34.6087675,)
(Pdb) start=(_routes.source_long.iloc[i],_routes.source_lat.iloc[i])
(Pdb) end=(_routes.destination_long.iloc[i],_routes.destination_lat.iloc[i])
(Pdb) _start_point = np.array(start)
(Pdb) _start_point_idx = self._btree.query(_start_point, k=1)[1]
*** NameError: name 'self' is not defined
(Pdb) _start_point_idx = router._btree.query(_start_point, k=1)[1]
(Pdb) from_node = router.node_map.loc[_start_point_idx, "node_id"]
(Pdb) _end_point = np.array(end)
(Pdb) _end_point_idx = router._btree.query(_end_point, k=1)[1]
(Pdb) to_node = router.node_map.loc[_end_point_idx, "node_id"]
*** KeyError: 122978
(Pdb) _end_point_idx
122978
(Pdb) _end_point
array([nan, nan])
(Pdb)

Looks like the AR-originating routes with NANs for destination lat/longs are all going to future-deployment power plants (the destination_facility_id values indicate so):

image

The future-deployment plant in AR likewise has NAN lat/longs, as per route_list.

So: The future-deployment plants have assumed locations at the

... This is probably related to f9fb9c8

The main fix there was to specify that only lat/long columns were being averaged, and I added a filter that only power plant locations should be included in that average. Therefore, states with no existing wind plants would have NAN lat/longs for future deployment plants (no existing lat/longs to average --> NAN output).

Test fix: Change line 469 of compute_locations.py from

self.locs.loc[self.locs.facility_type == 'power plant'].groupby(

to

self.locs.groupby(

(I.e. reverting only the change made to that line in f9fb9c8

After re-running, the code proceeds past the AR error and no NANs are present in the route_list file.

I'm not going to test all the way through to the end as it'll take too long and eat up processing power, but this particular bug should be resolved with the next couple of commits.

rjhanes commented 4 months ago

closed in release v1.3.3