PaddlePaddle / models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
Apache License 2.0
6.9k stars 2.91k forks source link

KT-NET模型编码错误 #4014

Open Soulmate303 opened 4 years ago

Soulmate303 commented 4 years ago

数据集SQuAD中的数据夹杂一些Unicode编码例如L\u00facia Moniz,KT-NET模型预处理时不能有效转化为正常的字符从而报错,不知如何解决

willthefrog commented 4 years ago

请 @kuke 跟进下。

kuke commented 4 years ago

可以把环境和报错信息贴了一个吗?

Soulmate303 commented 4 years ago

可以把环境和报错信息贴了一个吗? 环境为:python3.5,Driver API Version: 10.2, Runtime API Version: 10.0,cuDNN Version: 7.5 我在squad.py代码中将query_tokens和tokenization_info['query_subtokens']打印出来分别为q_t,t_i 报错信息: 11/29/2019 07:31:29 - INFO - reader.squad - q_t:['Has', 'Japan', 'ever', 'attacked', 'Thailand', '?'] 11/29/2019 07:31:29 - INFO - reader.squad - t_i:['Has', 'Japan', 'ever', 'attacked', 'Thailand', '?'] 11/29/2019 07:31:29 - INFO - reader.squad - q_t:['What', 'did', 'the', 'old', 'letters', '[UNK]', '[UNK]', '[UNK]', 'and', '[UNK]', '[UNK]', '[UNK]', 'become', '?'] 11/29/2019 07:31:29 - INFO - reader.squad - t_i:['What', 'did', 'the', 'old', 'letters', '\u27e8', '\u0456', '\u27e9', 'and', '\u27e8', '[UNK]', '\u27e9', 'become', '?'] 11/29/2019 07:31:29 - WARNING - root - Your decorated reader has raised an exception! Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/usr/lib/python3.5/threading.py", line 862, in run self._target(*self._args, *self._kwargs) File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/layers/io.py", line 474, in __provider_thread__ six.reraise(sys.exc_info()) File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/layers/io.py", line 455, in provider_thread__ for tensors in func(): File "/tmp/pycharm_project_946/reading_comprehension/src/reader/squad.py", line 675, in wrapper features, batch_size, self._in_tokens): File "/tmp/pycharm_project_946/reading_comprehension/src/reader/squad.py", line 626, in batch_reader for (index, feature) in enumerate(features): File "/tmp/pycharm_project_946/reading_comprehension/src/reader/squad.py", line 265, in call__ assert query_tokens == tokenization_info['query_subtokens'] AssertionError

/usr/local/lib/python3.5/dist-packages/paddle/fluid/executor.py:774: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "src/run_squad.py", line 594, in train(args) File "src/run_squad.py", line 517, in train outputs = train_exe.run(fetch_list=fetch_list) File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/parallel_executor.py", line 311, in run return_numpy=return_numpy) File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/executor.py", line 775, in run six.reraise(*sys.exc_info()) File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/executor.py", line 770, in run use_program_cache=use_program_cache) File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/executor.py", line 829, in _run_impl return_numpy=return_numpy) File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/executor.py", line 669, in _run_parallel tensors = exe.run(fetch_var_names)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) 2 paddle::operators::LookupTableOp::InferShape(paddle::framework::InferShapeContext) const 3 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext) const 4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void> const&) const 5 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void, boost::detail::variant::void> const&) 6 paddle::framework::details::ComputationOpHandle::RunImpl() 7 paddle::framework::details::ThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 8 std::_Function_handler<std::unique_ptr<std::future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result, std::future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) 9 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&) 10 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const


Python Call Stacks (More useful to users):

File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/framework.py", line 2423, in append_op attrs=kwargs.get("attrs", None)) File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/paddle/fluid/layers/nn.py", line 638, in embedding 'padding_idx': padding_idx File "/tmp/pycharm_project_946/reading_comprehension/src/model/bert.py", line 97, in _build_model is_sparse=False) File "/tmp/pycharm_project_946/reading_comprehension/src/model/bert.py", line 87, in init self._build_model(src_ids, position_ids, sentence_ids, input_mask) File "src/run_squad.py", line 152, in create_model use_fp16=args.use_fp16) File "src/run_squad.py", line 383, in train freeze=args.freeze) File "src/run_squad.py", line 594, in train(args)


Error Message Summary:

PaddleCheckError: Expected ids_dims[ids_rank - 1] == 1, but received ids_dims[ids_rank - 1]:0 != 1:1. ShapeError: The last dimensions of the 'Ids' tensor must be 1. But received Ids's last dimensions = 0, Ids's shape = [0]. at [/paddle/paddle/fluid/operators/lookup_table_op.cc:51] [operator < lookup_table > error]

Soulmate303 commented 4 years ago

可以把环境和报错信息贴了一个吗?

还有一个问题想请教:本地实验条件有限,BASE_Large的跑到一半跑不动,在忽略上面一个问题的情况下仅将Large模型文件换为官方的Base模型文件,得到的结果非常不理想,想知道在替换时需要注意什么? @kuke

Dogy06 commented 4 years ago

delete assertion code and assign the document tokens to all_doc_tokens since thry are not used anymore:

if not all_doc_tokens == tokenization_info['document_subtokens']: tokenization_info['document_subtokens'] = all_doc_tokens

worked for me.