PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
11.15k stars 1.85k forks source link

PreconditionNotMetError: warp-ctc [version 2] Error in get_workspace_size: Summary of this error. #497

Closed ashutosh486 closed 4 years ago

ashutosh486 commented 4 years ago

I am trying to train an ASR model on from the pre-trained baidu-en8k.

Command Passed:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 
python -u train.py --batch_size=8 --num_epoch=50 --num_conv_layers=2 --num_rnn_layers=3 --rnn_layer_size=1024 
--num_iter_print=100 --save_epoch=1 --num_samples=60000 --learning_rate=5e-3 --max_duration=45.0 
--min_duration=0.0 --test_off=False --use_sortagrad=True --use_gru=True --use_gpu=True --is_local=True 
--share_rnn_weights=False --init_from_pretrained_model='models/baidu_en8k' --train_manifest='data/dataset/manifest.train' 
--dev_manifest='data/dataset/manifest.val' --mean_std_path='data/dataset/mean_std.npz' 
--vocab_path='data/dataset/vocab.txt' --output_model_dir='./checkpoints/exp2' --specgram_type='linear'
--shuffle_method='batch_shuffle_clipped'

Error Got:

finish initing model from pretrained params from models/baidu_en8k
epoch: 0, batch: 0, train loss: 485.547302

epoch: 0, batch: 100, train loss: 42.384502

/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "train.py", line 142, in <module>
    main()
  File "train.py", line 138, in main
    train()
  File "train.py", line 133, in train
    test_off=args.test_off)
  File "/home/ubuntu/DeepSpeech/model_utils/model.py", line 348, in train
    return_numpy=False)
  File "/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 1071, in run
    six.reraise(*sys.exc_info())
  File "/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 1066, in run
    return_merged=return_merged)
  File "/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl
    return_merged=return_merged)
  File "/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel
    tensors = exe.run(fetch_var_names, return_merged)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet:

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2   paddle::operators::WarpCTCFunctor<paddle::platform::CUDADeviceContext>::operator()(paddle::framework::ExecutionContext const&, float const*, float*, int const*, int const*, int const*, unsigned long, unsigned long, unsigned long, float*)
3   paddle::operators::WarpCTCKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
4   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::WarpCTCKernel<paddle::platform::CUDADeviceContext, float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
5   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
6   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
7   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
8   paddle::framework::details::ComputationOpHandle::RunImpl()
9   paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
10  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
11  std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
12  std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
13  ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/paddleDS2/lib/python2.7/site-packages/paddle/fluid/layers/loss.py", line 638, in warpctc
    'norm_by_times': norm_by_times,
  File "/home/ubuntu/DeepSpeech/model_utils/network.py", line 446, in deep_speech_v2_network
    input=fc, label=text_data, blank=dict_size, norm_by_times=True)
  File "/home/ubuntu/DeepSpeech/model_utils/model.py", line 145, in create_network
    share_rnn_weights=self._share_rnn_weights)
  File "/home/ubuntu/DeepSpeech/model_utils/model.py", line 281, in train
    train_reader, log_probs, ctc_loss = self.create_network()
  File "train.py", line 133, in train
    test_off=args.test_off)
  File "train.py", line 138, in main
    train()
  File "train.py", line 142, in <module>
    main()

----------------------
Error Message Summary:
----------------------
PreconditionNotMetError: warp-ctc [version 2] Error in get_workspace_size: unknown error
  [Hint: Expected CTC_STATUS_SUCCESS == status, but received CTC_STATUS_SUCCESS:0 != status:4.] at (/paddle/paddle/fluid/operators/warpctc_op.h:101)
  [operator < warpctc > error]
chai21b commented 2 years ago

Hi, I came across the same issue with warpctc error while training the model with my dataset for recognition on en-pp-ocr-v3. As a beginner in coding and ML, I didn't know how to fix this, nor did I find any solution in internet. I deleted the cloned git repo, uninstalled paddlepaddle, paddleocr packages and started again new. This time I didn't get this error. I'm trying to figure out what was the issue that caused the error.