Closed kusumlata123 closed 2 years ago
this is when im running on my dataset
Again, very hard to debug these things without a reproducible colab notebook.
From the files you shared in the email, I would try to check the tokenizer, as the error seems to be related to [UNK]
(unknown token).
Can you try changing line 255 in extract_bert_features/extract_features.py
from
tokenizer = tokenization.FullTokenizer(
vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)
to
tokenizer = tokenization.FullTokenizer(
vocab_file=FLAGS.vocab_file,
spm_model_file=<path to your sentencepiece tokenizer model>,
do_lower_case=FLAGS.do_lower_case)
^ try adding the path to your sentencepiece tokenizer when initializing the tokenizer
Solved the problem.
On Thu, 23 Dec, 2021, 7:36 pm Gowtham.R, @.***> wrote:
Again, very hard to debug these things without a reproducible colab notebook.
From the files you shared in the email, I would try to check the tokenizer, as the error seems to be related to [UNK] (unknown token).
Can you try changing line 255 in extract_bert_features/extract_features.py from
tokenizer = tokenization.FullTokenizer( vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)
to
tokenizer = tokenization.FullTokenizer( vocab_file=FLAGS.vocab_file, spm_model_file=
, do_lower_case=FLAGS.do_lower_case) ^ try adding the path to your sentencepiece tokenizer when initializing the tokenizer
— Reply to this email directly, view it on GitHub https://github.com/AI4Bharat/indic-bert/issues/37#issuecomment-1000327965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJKEARE3EWMBTNBYT22SF5TUSMUF7ANCNFSM5KUTPFSA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
raceback (most recent call last):
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 235, in call ret = func(*args)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 594, in generator_py_func values = next(generator_state.get_iterator(iterator_id))
File "extract_features.py", line 244, in convert_examples_to_features window_size)
File "extract_features.py", line 188, in _convert_example_to_features input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 242, in convert_tokens_to_ids return convert_by_vocab(self.vocab, tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 160, in convert_by_vocab output.append(vocab[item])
KeyError: '[UNK]'
ERROR:tensorflow:Error recorded from prediction_loop: exceptions.KeyError: '[UNK]' Traceback (most recent call last):
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 235, in call ret = func(*args)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 594, in generator_py_func values = next(generator_state.get_iterator(iterator_id))
File "extract_features.py", line 244, in convert_examples_to_features window_size)
File "extract_features.py", line 188, in _convert_example_to_features input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 242, in convert_tokens_to_ids return convert_by_vocab(self.vocab, tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 160, in convert_by_vocab output.append(vocab[item])
KeyError: '[UNK]'
E1223 17:05:18.867840 140097924953920 error_handling.py:75] Error recorded from prediction_loop: exceptions.KeyError: '[UNK]' Traceback (most recent call last):
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 235, in call ret = func(*args)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 594, in generator_py_func values = next(generator_state.get_iterator(iterator_id))
File "extract_features.py", line 244, in convert_examples_to_features window_size)
File "extract_features.py", line 188, in _convert_example_to_features input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 242, in convert_tokens_to_ids return convert_by_vocab(self.vocab, tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 160, in convert_by_vocab output.append(vocab[item])
KeyError: '[UNK]'
INFO:tensorflow:prediction_loop marked as finished I1223 17:05:18.869065 140097924953920 error_handling.py:101] prediction_loop marked as finished WARNING:tensorflow:Reraising captured error W1223 17:05:18.869143 140097924953920 error_handling.py:135] Reraising captured error 0%| | 0/2451534 [00:02<?, ?it/s] Traceback (most recent call last): File "extract_features.py", line 338, in
tf.compat.v1.app.run()
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "extract_features.py", line 304, in main
for result in estimator.predict(input_fn, yield_single_examples=True):
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3078, in predict
rendezvous.raise_errors()
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
six.reraise(typ, value, traceback)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
yield_single_examples=yield_single_examples):
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 640, in predict
preds_evaluated = mon_sess.run(predictions)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
run_metadata=run_metadata)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
raise six.reraise(original_exc_info)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
return self._sess.run(args, *kwargs)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
run_metadata=run_metadata)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
return self._sess.run(args, **kwargs)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: exceptions.KeyError: '[UNK]'
Traceback (most recent call last):
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 235, in call ret = func(*args)
File "/home/dr/anaconda3/envs/hcoref/lib/python2.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 594, in generator_py_func values = next(generator_state.get_iterator(iterator_id))
File "extract_features.py", line 244, in convert_examples_to_features window_size)
File "extract_features.py", line 188, in _convert_example_to_features input_ids = tokenizer.convert_tokens_to_ids(tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 242, in convert_tokens_to_ids return convert_by_vocab(self.vocab, tokens)
File "/home/dr/Desktop/Hindi-coref/extract_bert_features/tokenization.py", line 160, in convert_by_vocab output.append(vocab[item])
KeyError: '[UNK]'
why it is occuring