DSXiangLi / ChineseNER

中文NER的那些事儿
309 stars 58 forks source link

bert_bilstm_crf_adv:ValueError: Shape must be rank 2 but is rank 1 for 'task1_msra/crf_layer/Slice_2' (op: 'Slice') with input shapes: [?], [2], [2]. #10

Open LinJingOK opened 2 years ago

LinJingOK commented 2 years ago

报错信息:

`During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:/workspace/PycharmProjects/nlp/EmilyNER/test/ChineseNER-main/main.py", line 212, in singletask_train(args) File "E:/workspace/PycharmProjects/nlp/EmilyNER/test/ChineseNER-main/main.py", line 83, in singletask_train tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 473, in train_and_evaluate return executor.run() File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 613, in run return self.run_local() File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 714, in run_local saving_listeners=saving_listeners) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1188, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1146, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "E:\workspace\PycharmProjects\nlp\EmilyNER\test\ChineseNER-main\tools\train_utils.py", line 150, in model_fn loss, pred_ids = build_graph(features=features, labels=labels, params=params, is_training=is_training) File "E:\workspace\PycharmProjects\nlp\EmilyNER\test\ChineseNER-main\model\bert_bilstm_crf.py", line 30, in build_graph trans, log_likelihood = crf_layer(logits, label_ids, seq_len, params['label_size'], is_training) File "E:\workspace\PycharmProjects\nlp\EmilyNER\test\ChineseNER-main\tools\layer.py", line 126, in crf_layer sequence_lengths=150 # [batch_size] [32] File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 257, in crf_log_likelihood transition_params) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 116, in crf_sequence_score false_fn=_multi_seq_fn) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\layers\utils.py", line 202, in smart_cond pred, true_fn=true_fn, false_fn=false_fn, name=name) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\framework\smart_cond.py", line 56, in smart_cond return false_fn() File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 106, in _multi_seq_fn transition_params) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 332, in crf_binary_score truncated_masks = array_ops.slice(masks, [0, 1], [-1, -1]) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\ops\array_ops.py", line 733, in slice return gen_array_ops.slice(input, begin, size, name=name) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 10488, in _slice "Slice", input=input, begin=begin, size=size, name=name) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(args, **kwargs) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op op_def=op_def) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\framework\ops.py", line 2027, in init control_input_ops) File "D:\environment\Anaconda3\envs\ChineseNER-main\lib\site-packages\tensorflow\python\framework\ops.py", line 1867, in _create_c_op raise ValueError(str(e)) ValueError: Shape must be rank 2 but is rank 1 for 'crf_layer/Slice_2' (op: 'Slice') with input shapes: [?], [2], [2].

Process finished with exit code 1 ` python main.py --model bert_bilstm_crf --data msra 单任务时也报这个错,我在想会不会是数据处理时的问题。所以我把自己数据处理的流程详细说一下 先根据readme下载了google的bert文件放在pretrain_model/ch_google中,然后执行data/msra/preprocess.py,生成了bert和其他模型的tfrecord文件,这里bert的有三个bert_giga_valid.tfrecord,bert_giga_train.tfrecord,bert_giga_predict.tfrecord。然后执行python main.py --model bert_bilstm_crf --data msra 就出现了Shape must be rank 2 but is rank 1 for 'crf_layer/Slice_2' (op: 'Slice') with input shapes: [?], [2], [2].这个错误。麻烦了

DSXiangLi commented 2 years ago

@LinJingOK 是数据生成有问题,giga和bert是两个不同的tokenizer,前者是词粒度,后者是token粒度。bert模型使用的都是bert tokenizer,所以tfrecord文件是bert_train.tfrecord, 其他非bert模型是giga_train.tfrecord, 词表增强文件会是giga_softword.tfrecord之类的

LinJingOK commented 2 years ago

@LinJingOK 是数据生成有问题,giga和bert是两个不同的tokenizer,前者是词粒度,后者是token粒度。bert模型使用的都是bert tokenizer,所以tfrecord文件是bert_train.tfrecord, 其他非bert模型是giga_train.tfrecord, 词表增强文件会是giga_softword.tfrecord之类的

您好,谢谢,这个问题已经解决,将bert的路径改为绝对路径解决了。目前生成了您所说的bert_train.tfrecord,bert_valid.tfrecord,bert_predict.tfrecord三个文件.我将config.py中的epoch_size设置为1,然后,执行了python main.py --model bert_bilstm_crf --data msr,项目跑起来了,gpu内存利用率也有,但是一次迭代已经训练了两个小时了还没有结束,输出预测信息,终端日志里面除了打印参数信息,剩下的都是warning,没有其他输出,我想问一下这样的训练是正常的吗,大概需要多久才能训练完成?我看您默认的迭代次数是50,您训练多久? ==========TRAIN PARAMS========== {'dtype': tf.float32, 'lr': 5e-06, 'log_steps': 100, 'pretrain_dir': './pretrain_model/ch_google', 'batch_size': 32, 'epoch_size': 1, 'warmup_ratio': 0.1, 'early_stop_ratio': 1, 'cell_type': 'lstm', 'cell_size': 1, 'hidden_units_list': [128], 'keep_prob_list': [0.8], 'rnn_activation': 'relu', 'diff_lr_times': {'crf': 500, 'logit': 500, 'lstm': 100}, 'n_sample': 86918, 'max_seq_len': 150, 'label_size': 7, 'tag2idx': {'[PAD]': 0, 'B': 1, 'I': 2, 'E': 3, 'S': 4, '[CLS]': 5, '[SEP]': 6}, 'idx2tag': {0: '[PAD]', 1: 'B', 2: 'I', 3: 'E', 4: 'S', 5: '[CLS]', 6: '[SEP]'}, 'step_per_epoch': 2716, 'num_train_steps': 2716} ==========RUN PARAMS========== {'summary_steps': 10, 'log_steps': 100, 'save_steps': 500, 'keep_checkpoint_max': 3, 'allow_growth': True, 'pre_process_gpu_fraction': 0.8, 'log_device_placement': True, 'allow_soft_placement': True, 'inter_op_parallel': 2, 'intra_op_parallel': 2}

DSXiangLi commented 2 years ago

@LinJingOK checkpoint里面会生成对应ckpt文件,可以用tensorboard --logdir ./checkpoint/your_model_path 来查看模型当前训练进展

LinJingOK commented 2 years ago

@LinJingOK checkpoint里面会生成对应ckpt文件,可以用tensorboard --logdir ./checkpoint/your_model_path 来查看模型当前训练进展

很抱歉又要打扰您,我训练单任务花费了很长时间,但是程序能够正常结束,并作了evaluation,可以输出预测结果。现在我在跑(bert_bilstm_crf_adv.py)我的命令是python main.py --model bert_bilstm_crf_adv --data msra,msr,参数batch=16,epoch=1,程序正常运行了大概一个小时,程序报错了,生成的文件夹中ner_msra_msr_bert_bilstm_crf_adv中最后个文件是model.ckpt-7500,tensorboard中loss还在2左右,底层错误我先查找了环境的版本,重要的依赖与您的都保持一直了,报错信息如下: `Traceback (most recent call last): File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call return fn(*args) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: slice index 0 of dimension 0 out of bounds. [[{{node strided_slice_2}}]] (1) Invalid argument: slice index 0 of dimension 0 out of bounds. [[{{node strided_slice_2}}]] [[gradients/task2_msr/bilstm_layer/bidirectional_rnn/bw/bw/transpose_grad/InvertPermutation/_6090]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "F:/linjing/workspace/ChineseNER-main/main.py", line 149, in multitask_train(args) File "F:/linjing/workspace/ChineseNER-main/main.py", line 103, in multitask_train tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 473, in train_and_evaluate return executor.run() File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 613, in run return self.run_local() File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 714, in run_local saving_listeners=saving_listeners) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1192, in _train_model_default saving_listeners) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1484, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\training\monitored_session.py", line 754, in run run_metadata=run_metadata) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1252, in run run_metadata=run_metadata) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1353, in run raise six.reraise(original_exc_info) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\six.py", line 719, in reraise raise value File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1338, in run return self._sess.run(args, *kwargs) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1411, in run run_metadata=run_metadata) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1169, in run return self._sess.run(args, **kwargs) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\client\session.py", line 950, in run run_metadata_ptr) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run run_metadata) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: slice index 0 of dimension 0 out of bounds. [[node strided_slice_2 (defined at F:\linjing\workspace\ChineseNER-main\tools\train_utils.py:199) ]] (1) Invalid argument: slice index 0 of dimension 0 out of bounds. [[node strided_slice_2 (defined at F:\linjing\workspace\ChineseNER-main\tools\train_utils.py:199) ]] [[gradients/task2_msr/bilstm_layer/bidirectional_rnn/bw/bw/transpose_grad/InvertPermutation/_6090]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'strided_slice_2': File "F:/linjing/workspace/ChineseNER-main/main.py", line 149, in multitask_train(args) File "F:/linjing/workspace/ChineseNER-main/main.py", line 103, in multitask_train tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 473, in train_and_evaluate return executor.run() File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 613, in run return self.run_local() File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\training.py", line 714, in run_local saving_listeners=saving_listeners) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1188, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1146, in _call_model_fn model_fn_results = self._model_fn(features=features, *kwargs) File "F:\linjing\workspace\ChineseNER-main\tools\train_utils.py", line 199, in model_fn tokens = tf.boolean_mask(features['tokens'], mask1, axis=0)[0,:] File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\ops\array_ops.py", line 680, in _slice_helper name=name) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\ops\array_ops.py", line 846, in strided_slice shrink_axis_mask=shrink_axis_mask) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 12096, in strided_slice shrink_axis_mask=shrink_axis_mask, name=name) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(args, **kwargs) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op op_def=op_def) File "D:\anaconda\anaconda3\envs\dong_chineseNER\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

Process finished with exit code 1` 如果您有时间的话,能帮我看看吗?还有我在运行顺序:我先msra和msr生成tfrecord文件,然后运行adv的命令,我的执行顺序对着吗?疑问dataset.py需要吗?

weiambt commented 6 months ago

@LinJingOK,您好,请问这个问题解决了吗,我bert_bilstm_crf_adv.py最近好像也遇到了这个问题,报错InvalidArgumentError,我的epoch_size也设置的是1。

tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 0 of dimension 0 out of bounds.
     [[node strided_slice_4 (defined at /share/home/MP2209128/ChineseNER/ChineseNER-local/tools/train_utils.py:204) ]]