Closed rcmcabral closed 4 years ago
Thanks for your comments. To re-create the Learner
in order to continue training when logging back into Google Colab, you need the model and the training data at minimum, as ktrain inspects both to automate things for ease of use.
When you call predictor.save
, it saves the model (in addition to the Preprocessor
instance). So, assuming you saved the Predictor
instance at the end of the initial training session, you can re-create the Learner
instance as follows:
import ktrain
predictor = ktrain.load_predictor('/path_to_saved_predictor')
learner = ktrain.get_learner(predictor.model, train_data = (xTrain, yTrain), batch_size = 12)
# continue training here (e.g., learner.fit_onecycle)
Note that there is also the methods learner.save_model
and learner.load_model
, but these are intended to be used to save and reload models during interactive training (so you can go back to an earlier model if you end up overfitting).
Hope this helps.
P.S. The ktrain.load_model
function (as opposed to learner.load_model
) is actually a reference to the load_model
function in Keras (which is used internally) and will probably be removed from ktrain
namespace in future versions of ktrain.
You might also consider using DistilBert
, which often has nearly the same performance but half the parameters using either the text_classifier
API or te Transformer
API.
Thanks for the prompt reply @amaiya ! Works like a charm! I guess I got lost in looking for a load_model
function that would accept the saved tf_model.h5 file that I didn't notice predictor.model
is exposed.
Also, thanks for the suggestions! Will look into them after!
Looking forward to future, leaner versions. Thank you for your work!
I am training a model using fasttext in Colab following this response but I'm getting an error. I saved the predictor like this:
predictor = ktrain.get_predictor(learner.model, preproc)
predictor.save(path)
Loaded it again:
predictor = ktrain.load_predictor(path)
learner = ktrain.get_learner(predictor.model, train_data = train, val_data = val, batch_size = 12)
And when I tried to train another cycle it failed. More precisely, upon running learner.fit_onecycle(2e-4, 1, class_weight=class_weights)
I got the following error:
begin training using onecycle policy with max lr of 0.0002...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-93fc30f9a17a> in <module>()
----> 1 learner.fit_onecycle(2e-4, 1, class_weight=class_weights)
18 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py in wrapper(*args, **kwargs)
235 except Exception as e: # pylint:disable=broad-except
236 if hasattr(e, 'ag_error_metadata'):
--> 237 raise e.ag_error_metadata.to_exception(e)
238 else:
239 raise
ValueError: in converted code:
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py:677 map_fn
batch_size=None)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py:2410 _standardize_tensors
exception_prefix='input')
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py:510 standardize_input_data
'for each key in: ' + str(names))
ValueError: No data provided for "embedding_4_input". Need data for each key in: ['embedding_4_input']
Am I failing to follow the response correctly? What could the error be? Thanks in advance!
Hi @msclar I suspect that it is being caused by the way you're loading your data. The fasttext
model is not a pretrained model like BERT or DistilBert. As a result, instead of using a preset vocabulary, it learns the vocabulary from the training data. The embedding layers of the fasttext
model are configured based on this original learned vocabulary. If you reload a different training set from scratch, it will have a new vocabulary which will confuse the embedding layer.
If you follow the same steps you provided above but continue training using the same original training set (or a training set preprocesed using the same tokenizer learned from original training set),, the error does not occur. It will also be avoided if you use a pretrained model like BERT or DistilBert.
I was caught by the downsides of using a notebook!! I did load the dataset fixing the random_state to get the same training set, but the variable train
was referring to a dataset processed with DistilBERT in another part of my notebook.
Thank you for the swift reply, I only realized the bug because of your reply!
While training for NER ,my system errored at 10th epoch and training stopped. No specific error written in logs. I have enabled checkpoint and have the hd5 files written for each epoch. I tried to load the 10th epoch file using the following line of code learner = ktrain.load_model('../models/checkpoints/weights-10.hdf5')
received the following error Traceback (most recent call last): File "/home/user1234/miniconda3/envs/ktrain/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 165, in load_model_from_hdf5 raise ValueError('No model found in config file.') ValueError: No model found in config file. While the checkpoints are written it does not write the "model_config", which the above code is trying to load , resulting in error. Is there a way to retrain from where it last stopped. There is no final model file written and dont have .preproc file.
@sathish331977 : The checkpoint_folder
argument saves only the weights of the model after each epoch, so use load_weights
:
# recreate model from scratch
txt.sequence_tagger(...
# load checkpoint weights into model
model.load_weights('../models/checkpoints/weights-10.hdf5')
# recreate learner
learner = ktrain.get_learner(model, ...
# continue training here
@amaiya @sathish331977 Hi,
Could you please look into this issue.
learner=ktrain.get_learner(predictor.model, train_data=trn, val_data=val, batch_size=128) learner.fit(0.005, 1, cycle_len=20, checkpoint_folder='training_data_new/after_30')
with this i am getting this error. `/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs) 971 except Exception as e: # pylint:disable=broad-except 972 if hasattr(e, "ag_error_metadata"): --> 973 raise e.ag_error_metadata.to_exception(e) 974 else: 975 raise
AssertionError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:806 train_function *
return step_function(self, iterator)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:796 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2945 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:789 run_step **
outputs = model.train_step(data)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:747 train_step
y_pred = self(x, training=True)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985 __call__
outputs = call_fn(inputs, *args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:386 call
inputs, training=training, mask=mask)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:517 _run_internal_graph
assert x_id in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("dense_1/truediv:0", shape=(None, None, 66), dtype=float32)`
I am loading thr model using ktrain.load_predictor and then using already trained model by predictor.model and got learner obj. so when i am further training i got the following error could you please look into this.
@sirisha-8: You haven't provided enough information, as it's not clear what model you're using, what task you're performing, or what TensorFlow version you're using, etc. I've tested this on my end with transformers
-based text classification and everything works. If you still have trouble, please open a new issue with more details including a self-contained reproducible example, if possible.
@amaiya sorry for the lack of info. I am using biobert model
model = txt.sequence_tagger('bilstm-bert', preproc, bert_model='monologg/biobert_v1.1_pubmed')
The task i am performing is NER.
Tensorflow version is 2.1.0
I have already trained for 30 epochs using learner.fit and saved predictor also.
Now after loading model using ktrain.load_predictor and continue training using learner.fit for further 20 epochs I got above error. Not sure where i went wrong.Could you please look into this issue .I am happy to provide further details regarding this
@sirisha-8 I wasn't able to reproduce this issue and everything works fine for me. You'll need to provide a self-contained, reproducible example on Google Colab.
Hi. Thanks for the awesome tool! I managed to get it to work as described in the tutorials. However, I'm using BERT with a huge dataset so one epoch takes hours. On top of that, I'm using Google Colab which has time limits for GPU use. Because of this, I was hoping to save the model, reload and then call
learner.fit_onecycle
again to continue the training for some more epochs.I have a successfully saved the predictor files from a few epochs and I can reload it to make predictions. What I'm hoping to do now is get the learner class from it but looking at the source code, there's no way to do this outright. I moved to trying to load the model file itself and build the learner by calling
ktrain.get_learner()
again butktrain.load_model()
throws an error ofI've also thought about going through the entire process again up to building the model as prescribed then setting weights and getting learner:
This feels kinda hackish though since I'm not using the saved model files. Will this have the same effect or am I missing something from the source code in building the learner from the predictor?