Closed MajaRolevski closed 4 years ago
@MajaRolevski Thanks for raising the issue. How small is your "very small dataset"? And how does your "normal" dataset look like that resolves in this issue?
@tabergma Thank you for the answer. My "very small dataset" consists of 19 classes (intents); each class has 10-30 sentences, except one that has 347 sentences and the other with 1897. On the other hand, the "normal" dataset contains 19 classes, where each class has on average 70-100 sentences, except one with 1565 and the other with 41564 sentences.
@MajaRolevski Looks like your dataset has some sentences which are very long(>512 tokens). We'll add a fix soon so that the training doesn't break because of this but I would also suggest you to sanitize your data and possibly clean up such sentences because they are unusual and could be very noisy for training.
We are facing the same issue. Our response selector has some rather long answers, so we run into the BERT token limit as well. When I shorten my answers, the model trains fine again. It is hard to find a good cut off length for the answers though with the different tokenizer and libs involved.
I am looking forward to fix for this :-)
@dakshvar22 Thanks for fixing this. Will the fix be available for Rasa version 1.10.x
or only Rasa v2?
any news on it? i'm facing with this problem
i'm facing with this problem too,
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3453,0] = 1780522202 is not in [0, 120000) ----[[node net_input/embedding/embedding_lookup_55 (defined at ./check_ctr_v1.py:410) ]]
I define tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], self.saved_model_dir) in check_ctr_v1.py:410
Did someone solve it , please ?
您好,您的邮件已收到,我会尽快查看,
What could be the possible solution, or just a try except situation to handle.
Rasa version: 1.10.7 (also tried with 1.8.3)
Python version: 3.7
Operating system (windows, osx, ...): Linux and Windows
Issue: I have a problem with integrating BERT into the configuration of my model. Namely, at the very beginning of training it gives the error message shown below. There is also a config.yml file I used.
It is interesting that I am not getting this error when using a very small train dataset, so the model is able to finish with training successfully.
Error (including full traceback):
Command or request that led to error:
Content of configuration file (config.yml) (if relevant):