chris1610 / pbpython

Code, Notebooks and Examples from Practical Business Python
https://pbpython.com
BSD 3-Clause "New" or "Revised" License
1.99k stars 987 forks source link

KeyError: "['QN_9' 'RF_TU' 'eor'] not found in axis" in execution of "3-dwd_konverter_build_df.ipynb" #27

Open slowtoaccept opened 3 years ago

slowtoaccept commented 3 years ago

As instructed, all ipynb files run in sequence.

'Finished file: import\produkt_tu_stunde_20190409_20201231_00096.txt' 'This is file 10' 'Shape of the main_df is: (851261, 1)'

KeyError Traceback (most recent call last)

in 25 df = pd.read_csv(file, delimiter=";") 26 # Prepare the df befor merging (Drop obsolete, convert to datetime, filter to date, set index) ---> 27 df.drop(columns=obsolete_columns, inplace=True) 28 df["MESS_DATUM"] = pd.to_datetime(df["MESS_DATUM"], format="%Y%m%d%H") 29 df = df[df['MESS_DATUM']>= "2007-01-01"] ~\Anaconda3\envs\tide\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors) 4306 weight 1.0 0.8 4307 """ -> 4308 return super().drop( 4309 labels=labels, 4310 axis=axis, ~\Anaconda3\envs\tide\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors) 4151 for axis, labels in axes.items(): 4152 if labels is not None: -> 4153 obj = obj._drop_axis(labels, axis, level=level, errors=errors) 4154 4155 if inplace: ~\Anaconda3\envs\tide\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors) 4186 new_axis = axis.drop(labels, level=level, errors=errors) 4187 else: -> 4188 new_axis = axis.drop(labels, errors=errors) 4189 result = self.reindex(**{axis_name: new_axis}) 4190 ~\Anaconda3\envs\tide\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors) 5589 if mask.any(): 5590 if errors != "ignore": -> 5591 raise KeyError(f"{labels[mask]} not found in axis") 5592 indexer = indexer[~mask] 5593 return self.delete(indexer) KeyError: "['QN_9' 'RF_TU' 'eor'] not found in axis" ​
chris1610 commented 3 years ago

It looks like those columns are not in your data set. Since you're trying to drop them, it shouldn't matter.

You could try replacing the drop code with this:

df.drop(columns=obsolete_columns, inplace=True, errors='ignore')

This will tell pandas to ignore the error that's being raised because the columns are not in the DataFrame.

slowtoaccept commented 3 years ago

Hi Chris I've run the example code as provided w/o mods. Ran your suggested change (line 28) and got another error as seen below. I'm not an experienced "Pandite", but rely only on the provided code. Thanks for your help

'Finished file: import\produkt_tu_stunde_20190409_20201231_00096.txt' 'This is file 10' 'Shape of the main_df is: (851261, 1)'

KeyError Traceback (most recent call last) ~\Anaconda3\envs\tide\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3079 try: -> 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'MESS_DATUM'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)

in 27 # df.drop(columns=obsolete_columns, inplace=True) 28 df.drop(columns=obsolete_columns, inplace=True, errors='ignore') ---> 29 df["MESS_DATUM"] = pd.to_datetime(df["MESS_DATUM"], format="%Y%m%d%H") 30 df = df[df['MESS_DATUM']>= "2007-01-01"] 31 df.set_index(['MESS_DATUM', 'STATIONS_ID'], inplace=True) ~\Anaconda3\envs\tide\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 3022 if self.columns.nlevels > 1: 3023 return self._getitem_multilevel(key) -> 3024 indexer = self.columns.get_loc(key) 3025 if is_integer(indexer): 3026 indexer = [indexer] ~\Anaconda3\envs\tide\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err: -> 3082 raise KeyError(key) from err 3083 3084 if tolerance is not None: KeyError: 'MESS_DATUM'
chris1610 commented 3 years ago

Hmm. I'm not sure what't going on. It's likely there's an error earlier in the script and the files are downloaded or processed properly. You should try to look at the downloaded files and make sure they are placed in the correct directories and have the right content.

I realize that's a little vague for a new user but I think its likely something changed and the files are stored differently.

slowtoaccept commented 3 years ago

Hi Chris Here's a snippet from the imported file list. All have a MESS_DATUM column. Is MESS_DATUM format the problem? It is rejected by df["MESS_DATUM"] = pd.to_datetime(df["MESS_DATUM"], format="%Y%m%d%H") 17 2 Dir(s) 434,812,313,600 bytes... STATIONS_ID MESS_DATUM QN_9 TT_TU RF_TU eor 0 3 1950040101 5 5.7 83.0 eor 1 3 1950040102 5 5.6 83.0 eor 2 3 1950040103 5 5.5 83.0 eor 3 3 1950040104 5 5.5 83.0 eor 4 3 1950040105 5 5.8 85.0 eor

chris1610 commented 3 years ago

I re-ran this on my local machine and the file I see looks like yours so I think the date format is ok.

Is it possible that there is an extra file in your import directory? Look at each of the files and make sure they are all formatted the same.