cesium-ml / cesium

Machine Learning Time-Series Platform
Other
670 stars 101 forks source link

can't make build_model_from_featureset to work #250

Closed zardaloop closed 7 years ago

zardaloop commented 7 years ago

Hi, I am struggling to understand how exactly Cesium works. Therefore I have tried to put together a very simple example based on the example you have but it doesn't work.

I have also tried to load the plot_EEG_Example.ipynb on Jupyter and go through that step by step but that doesn't fully work either which I am not sure why. I guess it is to do with the version inconsistency and the fact probably some of the functions are changed by now.

Here is a very simple example I have and it doesn't work. I would highly appreciate if you could point out to me what I am doing wrong ?

When I run this code I get

TypeError: unhashable type: 'DataArray' on line 49

Many thanks in advance.

from cesium import datasets, featurize
import numpy as np
import pywt
import seaborn; seaborn.set()

from cesium.build_model import build_model_from_featureset
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from cesium.predict import model_predictions

eeg = datasets.fetch_andrzejak()

# Group together classes (Z, O), (N, F), (S) as normal, interictal, ictal
eeg["classes"] = eeg["classes"].astype('U16') #  allocate memory for longer class names
eeg["classes"][np.logical_or(eeg["classes"]=="Z", eeg["classes"]=="O")] = "Normal"
eeg["classes"][np.logical_or(eeg["classes"]=="N", eeg["classes"]=="F")] = "Interictal"
eeg["classes"][eeg["classes"]=="S"] = "Ictal"

features_to_use = ["amplitude",
                   "percent_beyond_1_std",
                   "maximum",
                   "max_slope",
                   "median",
                   "median_absolute_deviation",
                   "percent_close_to_median",
                   "minimum",
                   "skew",
                   "std",
                   "weighted_average"]

dataset = featurize.featurize_time_series(times=eeg["times"],
                                          values=eeg["measurements"],
                                          errors=None,
                                          features_to_use=features_to_use,
                                          targets=eeg["classes"])

train, test = train_test_split(np.arange(len(eeg["classes"])), random_state=0)

knn_param_grid = {'n_neighbors': [1, 2, 3, 4]}

model = build_model_from_featureset(dataset.isel(name=train),
                                    KNeighborsClassifier(),
                                    params_to_optimize=knn_param_grid)

prediction = model_predictions(dataset.isel(name=test), model, return_probs=False)

print("training accuracy={:.2%}, test accuracy={:.2%}".format(
          accuracy_score(prediction.prediction.values[train], eeg["classes"][train]),
          accuracy_score(prediction.prediction.values[test], eeg["classes"][test])))
bnaul commented 7 years ago

I agree that this is due to version inconsistencies; if you upgrade cesium to the current 0.9.2 and use the example code at http://cesium-ml.org/docs/auto_examples/plot_EEG_Example.html everything should be fine. If that doesn't help let me know and I'll re-open.