Closed semprepisfo closed 1 year ago
I wasn't able to reproduce this in TensorFlow 2.9.1. What version of TensorFlow are you using?
2.12.0 using colab ok so the error happens in ktrain model implementation, e.g., bert loaded, not with models from hugginface.co, tested albert, electra distillbert and it is not happening, only your model implementation tested right now
error Is Multi-Label? False maxlen is 310 /usr/local/lib/python3.9/dist-packages/keras/initializers/initializers.py:120: UserWarning: The initializer GlorotNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once. warnings.warn(
despite the error you can go about your daily task, wonder whats the impact, model seems to predict correctly
full code that leads to error: (x_train, y_train), (x_test, y_test), preproc = text.texts_from_array(x_train=X_train, y_train=y_train, x_test=X_test, y_test=y_test, class_names=[0,1], maxlen= 310, max_features=32000, preprocess_mode ='bert')
model = text.text_classifier('bert', train_data=(x_train, y_train), preproc=preproc)
imports are done like so: import tensorflow as tf
All transformer-based models in ktrain use HuggingFace Transformers except when you when you create a BERT model with text.text_classifier('bert', train_data=(x_train, y_train), preproc=preproc)
. When you create a BERT model like this, ktrain currently uses the keras_bert
package. (The only reason for this is that this function was added before Hugging Face Transformers supported TensorFlow.) In the future, we may replace the implementation so that transformers
is used instead of keras_bert
. This is what happens when you replace 'bert' with 'distilbert' in the text_classifier
function - i.e., HuggingFace Transformers is used under the hood. It doesn't look like the warning is being generated from the ktrain codebase (the warning seems like it's coming from keras_bert
). So, if you want to avoid the warning, you can use the Transformer
class in ktrain or you can just call the text_classifier
method with distilbert
instead of bert
which should yield similar performance:
# load text data
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med']
from sklearn.datasets import fetch_20newsgroups
train_b = fetch_20newsgroups(subset='train', categories=categories, shuffle=True)
test_b = fetch_20newsgroups(subset='test',categories=categories, shuffle=True)
(x_train, y_train) = (train_b.data, train_b.target)
(x_test, y_test) = (test_b.data, test_b.target)
# train model
import ktrain
from ktrain import text
trn, val, preproc = text.texts_from_array(x_train=x_train, y_train=y_train,
x_test=x_test, y_test=y_test,
class_names=train_b.target_names,
preprocess_mode='distilbert',
maxlen=350)
model = text.text_classifier('distilbert', train_data=trn, preproc=preproc)
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=6)
learner.fit_onecycle(3e-5, 1)
Thanks for letting us know about the warning.
you welcome, best regards thanks for clarifying
using your examples of ktrain usage, e.g., imdb movie reviews I am getting:
model = text.text_classifier('bert', train_data=(x_train, y_train), preproc=preproc)
/usr/local/lib/python3.9/dist-packages/keras/initializers/initializers.py:120: UserWarning: The initializer GlorotNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once. warnings.warn( done.
where do I add this seed within ktrain context, not separately during model building stage (e.g. in keras sequential)