emanuel-metzenthin / Lime-For-Time

Application of the LIME algorithm by Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin to the domain of time series classification
95 stars 21 forks source link

Example code not working - ValueError: Expected 2D array, got 1D array instead #1

Closed marmg closed 4 years ago

marmg commented 5 years ago

Hi, I am trying to use the library with the example of the coffe dataset and I am getting the next error:

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in 1 explainer = lime_timeseries.LimeTimeSeriesExplanation(class_names=['0', '1'], feature_selection='auto') 2 exp = explainer.explain_instance(series, knn.predict_proba, num_features=num_features, num_samples=5000, num_slices=num_slices, ----> 3 replacement_method='total_mean', training_set=coffee_train_x) 4 exp.as_list() ~/projects/p34-xai/lime_timeseries.py in explain_instance(self, timeseries, classifier_fn, training_set, num_slices, labels, top_labels, num_features, num_samples, distance_metric, model_regressor, replacement_method) 66 """ 67 domain_mapper = explanation.DomainMapper() ---> 68 data, yss, distances = self.__data_labels_distances(timeseries, classifier_fn, num_samples, num_slices, training_set, replacement_method) 69 if self.class_names is None: 70 self.class_names = [str(x) for x in range(yss[0].shape[0])] ~/projects/p34-xai/lime_timeseries.py in __data_labels_distances(cls, time_series, classifier_fn, num_samples, num_slices, training_set, replacement_method) 142 inverse_data.append(tmp_series) 143 labels = classifier_fn(inverse_data) --> 144 distances = distance_fn(data) 145 return data, labels, distances ~/projects/p34-xai/lime_timeseries.py in distance_fn(x) 112 def distance_fn(x): 113 return sklearn.metrics.pairwise.pairwise_distances( --> 114 x, x[0], metric='cosine').ravel() * 100 115 116 # split time_series into slices ~/projects/p34-xai/.conda/envs/p34-xai-env-xai-k1/lib/python3.6/site-packages/sklearn/metrics/pairwise.py in pairwise_distances(X, Y, metric, n_jobs, **kwds) 1400 func = partial(distance.cdist, metric=metric, **kwds) 1401 -> 1402 return _parallel_pairwise(X, Y, func, n_jobs, **kwds) 1403 1404 ~/projects/p34-xai/.conda/envs/p34-xai-env-xai-k1/lib/python3.6/site-packages/sklearn/metrics/pairwise.py in _parallel_pairwise(X, Y, func, n_jobs, **kwds) 1067 1068 if effective_n_jobs(n_jobs) == 1: -> 1069 return func(X, Y, **kwds) 1070 1071 # TODO: in some cases, backend='threading' may be appropriate ~/projects/p34-xai/.conda/envs/p34-xai-env-xai-k1/lib/python3.6/site-packages/sklearn/metrics/pairwise.py in cosine_distances(X, Y) 550 """ 551 # 1.0 - cosine_similarity(X, Y) without copy --> 552 S = cosine_similarity(X, Y) 553 S *= -1 554 S += 1 ~/projects/p34-xai/.conda/envs/p34-xai-env-xai-k1/lib/python3.6/site-packages/sklearn/metrics/pairwise.py in cosine_similarity(X, Y, dense_output) 896 # to avoid recursive import 897 --> 898 X, Y = check_pairwise_arrays(X, Y) 899 900 X_normalized = normalize(X, copy=True) ~/projects/p34-xai/.conda/envs/p34-xai-env-xai-k1/lib/python3.6/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(X, Y, precomputed, dtype) 111 warn_on_dtype=warn_on_dtype, estimator=estimator) 112 Y = check_array(Y, accept_sparse='csr', dtype=dtype, --> 113 warn_on_dtype=warn_on_dtype, estimator=estimator) 114 115 if precomputed: ~/projects/p34-xai/.conda/envs/p34-xai-env-xai-k1/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 545 "Reshape your data either using array.reshape(-1, 1) if " 546 "your data has a single feature or array.reshape(1, -1) " --> 547 "if it contains a single sample.".format(array)) 548 549 # in the future np.flexible dtypes will be handled like object dtypes ValueError: Expected 2D array, got 1D array instead: array=[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. ` I hope you fix it soon, it looks awesome! Thanks.
angellr commented 5 years ago

I am getting the same error. This happened when scikitlearn went to the 0.20 version. Any thoughts as to what needs to change? Any help would be greatly appreciated.

fennovj commented 5 years ago

I played around with the notebook a little, managed to fix it: There are 2 issues, oth in the 'lime_timeseries.py' file:

First of all, on line 114, it should become a 1d array (not sure why it doesn't work by default). it needs to become this: x, x[0].reshape(1, -1), metric='cosine').ravel() * 100

Second of all, I'm not sure if this is for all versions of lime, but it is an issue for the latest version. Line 76 needs to change into ret_exp.score, _) = self.base.explain_instance_with_data(data, yss, distances, label, num_features,

I already made an issue about this in the LIME project: https://github.com/marcotcr/lime/issues/292

After those 2 changes, it worked for me.

kingspp commented 5 years ago

Fixes - https://github.com/emanuel-metzenthin/Lime-For-Time/pull/2