adriangb / scikeras

Scikit-Learn API wrapper for Keras.
https://www.adriangb.com/scikeras/
MIT License
239 stars 47 forks source link

can't pickle module objects #272

Closed jmolero52 closed 2 years ago

jmolero52 commented 2 years ago

Hi,

When I launch the "fit" method of the RandomizedSearchCV class I get the error "typeError: can't pickle module objects".

In the "fit" method "X" is a numpy.ndarray with shape (16710, 335) "y" is a pandas.core.series.Series with form (16710,) and numeric values

I am using Colab with the following versions:

Keras 2.8.0 scikeras 0.7.0 sklearn 1.0 Python 3

This is the complete log error:

TypeError                                 Traceback (most recent call last)

[<ipython-input-69-52840b1b7975>](https://localhost:8080/#) in <module>()
      4 
      5 
----> 6 grid_result = grid.fit(x_train_padded, y_train)
      7 
      8 test_accuracy = grid.score(x_test_padded, y_test)

3 frames

[/usr/lib/python3.7/copy.py](https://localhost:8080/#) in deepcopy(x, memo, _nil)
    167                     reductor = getattr(x, "__reduce_ex__", None)
    168                     if reductor:
--> 169                         rv = reductor(4)
    170                     else:
    171                         reductor = getattr(x, "__reduce__", None)

**TypeError: can't pickle module objects**

My code is :

#Create model
def create_model(embedding_matrix, num_words, max_length,units, dropout):

    model = Sequential()
    model.add(Embedding(num_words,embedding_dim, input_length=max_length))
    model.add(BatchNormalization())
    model.add(Bidirectional(LSTM(units,dropout=dropout)))
    model.add(Dense(4,activation=activations.softmax))

    model.compile(loss=losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])
    return model

#Create param grid
param_grid = dict(num_words=[num_words],                  
              max_length=[max_length],
              optimizer=["adam","rmsprop"],
              optimizer__learning_rate= [2e-3, 3e-3, 4e-3],
              fit__batch_size=[512,1024,2048],
              units = [16,32, 64, 128,256,512],
              dropout=[0.1,0.25,0.5],
              embedding_dim=[16,32,64,128,256,512])

#Create model instance
model = KerasClassifier(model=create_model,
      loss="sparse_categorical_crossentropy",                              
      metrics=['accuracy'],
      epochs=EPOCHS,
      class_weight=class_weight,
      units=16,
      dropout=0.25,
      num_words=num_words,
      max_length=max_length,
      verbose=False)

#Crate RandomizedSearchCV instance
grid = RandomizedSearchCV(estimator=model, param_distributions=param_grid,
                              cv=4, verbose=1, n_iter=5)

#Call fit
**grid_result = grid.fit(x_train_padded, y_train)**

test_accuracy = grid.score(x_test_padded, y_test)

print(f'Best accuracy: {grid_result.best_score_}')
print(f'Params: {grid_result.best_params_}')
print(f'Test accuracy: {test_accuracy}')
adriangb commented 2 years ago

Can you add imports and stub out data shapes so that this is a runnable example? Thanks

jmolero52 commented 2 years ago

Here are the links to the data. thank you!

X -> https://drive.google.com/file/d/1CNQnvnQHZc6P8tIMQbI8zmZ_2jJvvFZm/view?usp=sharing

Load with: x_train_padded = np.load(your_data_path+'x_train_padded.npy')

y-> https://drive.google.com/file/d/1-0VDskUao5JU6ccDf-Sk8TjrtoxYN-oT/view?usp=sharing

Load with: y_train = pd.read_csv(your_data_path+'y.csv')

jmolero52 commented 2 years ago

Hi Adrian,

I launched the notebook again this morning and now it seems to be working fine.

You can close this issue.

Thank you very much ! :)