IntelPython / sdc

Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler
https://intelpython.github.io/sdc-doc/
BSD 2-Clause "Simplified" License
646 stars 62 forks source link

Machine Learning NYSC example error #828

Closed abhi-84 closed 3 years ago

abhi-84 commented 4 years ago

Hi In nysc_predict.py example, when I remove @njit(parallel=True), it throws error. Why? How can I use SDC for any Machine learning examples? Please find the error it generates.

KeyError Traceback (most recent call last)

in 41 42 #t_start = time.time() ---> 43 data2012, coc12, data2012_low2high, x, y = preprocess_data() 44 45 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, shuffle=False) in preprocess_data() 20 21 # Remove unused columns ---> 22 df_prices_intc = df_prices_intc.drop(columns=('symbol', 'volume')) 23 24 # The year of interest is 2012 ~/anaconda3/envs/sdc_env/lib/python3.7/site-packages/pandas/core/frame.py in drop(self, labels, axis, index, columns, level, inplace, errors) 4115 level=level, 4116 inplace=inplace, -> 4117 errors=errors, 4118 ) 4119 ~/anaconda3/envs/sdc_env/lib/python3.7/site-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors) 3912 for axis, labels in axes.items(): 3913 if labels is not None: -> 3914 obj = obj._drop_axis(labels, axis, level=level, errors=errors) 3915 3916 if inplace: ~/anaconda3/envs/sdc_env/lib/python3.7/site-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors) 3944 new_axis = axis.drop(labels, level=level, errors=errors) 3945 else: -> 3946 new_axis = axis.drop(labels, errors=errors) 3947 result = self.reindex(**{axis_name: new_axis}) 3948 ~/anaconda3/envs/sdc_env/lib/python3.7/site-packages/pandas/core/indexes/base.py in drop(self, labels, errors) 5338 if mask.any(): 5339 if errors != "ignore": -> 5340 raise KeyError("{} not found in axis".format(labels[mask])) 5341 indexer = indexer[~mask] 5342 return self.delete(indexer) KeyError: "[('symbol', 'volume')] not found in axis"
densmirn commented 4 years ago

Hello @abhi-84 , This exception was raised because method DataFrame.drop is slightly different in use in Pandas and SDC due to a limitation on parameter columns. Pandas requires parameter columns to be a list, but SDC requires a tuple (DataFrame.drop limitations).

So if you replace tuple of columns

df_prices_intc = df_prices_intc.drop(columns=('symbol', 'volume'))

with a list of columns

df_prices_intc = df_prices_intc.drop(columns=['symbol', 'volume'])

in case of removed decorator njit, the issue will disappear.