alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.26k stars 1.13k forks source link

Wrong result decoding with test.c on Windows #973

Open opentld opened 2 years ago

opentld commented 2 years ago

platform: windows10, vs2019 when running tect.c, error occurs:

LOG (VoskAPI:Model::ReadDataFiles():src\model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6 LOG (VoskAPI:Model::ReadDataFiles():src\model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15 Wrong parameter 6 in LAPACKE_dsyev_work Wrong parameter 1 in LAPACKE_dsygv Wrong parameter 1 in LAPACKE_dsygv ... Wrong parameter 1 in LAPACKE_dsptri_work Wrong parameter 1 in LAPACKE_dsptri_work Wrong parameter 9 in LAPACKE_dsprfs_work ... LOG (VoskAPI:kaldi::nnet3::Nnet::RemoveOrphanNodes():nnet3\nnet-nnet.cc:948) Removed 0 orphan nodes. LOG (VoskAPI:kaldi::nnet3::Nnet::RemoveOrphanComponents():nnet3\nnet-nnet.cc:847) Removing 0 orphan components. LOG (VoskAPI:kaldi::nnet3::CompileLooped():nnet3\nnet-compile-looped.cc:345) Spent 0.0998472 seconds in looped compilation. LOG (VoskAPI:Model::ReadDataFiles():src\model.cc:248) Loading i-vector extractor from model/ivector/final.ie LOG (VoskAPI:kaldi::IvectorExtractor::ComputeDerivedVars():ivector\ivector-extractor.cc:183) Computing derived variables for iVector extractor Wrong parameter 1 in LAPACKE_dsyequb_work ... ERROR (VoskAPI:kaldi::TpMatrix::Cholesky():matrix\tp-matrix.cc:110) Cholesky decomposition failed. Maybe matrix is not positive definite.

what does these mean? @proger @camillem @dremendes @Sharcoux @hviana

nshmyrev commented 2 years ago

Did you build libvosk yourself with mkl or you use our prebuilt binary

opentld commented 2 years ago

Did you build libvosk yourself with mkl or you use our prebuilt binary

I build libvosk with Kaldi & openBlas...

opentld commented 2 years ago

Did you build libvosk yourself with mkl or you use our prebuilt binary

I use the libvosk.dll official release for chinese speech, the results seems wrong:

LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6 LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes. LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components. LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.0994599 seconds in looped compilation. LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from model/ivector/final.ie LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done. LOG (VoskAPI:ReadDataFiles():model.cc:278) Loading HCLG from model/graph/HCLG.fst LOG (VoskAPI:ReadDataFiles():model.cc:293) Loading words from model/graph/words.txt LOG (VoskAPI:ReadDataFiles():model.cc:302) Loading winfo model/graph/phones/word_boundary.int LOG (VoskAPI:ReadDataFiles():model.cc:309) Loading subtract G.fst model from model/rescore/G.fst LOG (VoskAPI:ReadDataFiles():model.cc:311) Loading CARPA model from model/rescore/G.carpa vosk_recognizer_final_result: { "text" : "缁?鏄?闃虫槬 娣?浜?澶у潡 鏂囩珷 鐨?搴曡壊 鍥涙湀 鐨?婊?鏇存槸 缁?鐨?椴滄椿 绉€濯?璇楁剰 鐩庣劧" }

nshmyrev commented 2 years ago

Hi

Probably data is wrong. Please share the audio file you are trying so we can reproduce.

opentld commented 2 years ago

https://user-images.githubusercontent.com/21096515/170046489-3413d0b6-5f8e-4c6a-9444-8c6af42e6385.mp4

Because wav format is not supported, so I named the extension wav as mp4, you can change the extension back to wav

Thank you very much !

@nshmyrev

nshmyrev commented 2 years ago

I get "绿 是 阳春 烟 酒 大块 文章 的 底色 四月 的 凌乱 更是 绿 的 鲜活 秀媚 诗意 盎然" for your file which is probably close. Maybe it is charset issue. Try to save to a file and open with a notepad. Encoding should be UTF-8.

opentld commented 2 years ago

I use this function to decoding UTF-8 to string, it works!

std::string UTF8ToString(const std::string& utf8Data) { std::wstring_convert<std::codecvt_utf8> conv; std::wstring wString = conv.from_bytes(utf8Data); // utf-8 => wstring

std::wstring_convert<std::codecvt< wchar_t, char, std::mbstate_t>>
    convert(new std::codecvt< wchar_t, char, std::mbstate_t>("CHS"));
std::string str = convert.to_bytes(wString);     // wstring => string

return str;

}

But, another question, it seems that the official libvosk.dll was not build with CUDA, would you please provide a GPU release? Thanks a lot !

@nshmyrev

nshmyrev commented 2 years ago

We do not support GPU on windows, it is more for linux server which needs to process hundreds of streams in parallel. You'd better use it with prebuilt docker.

opentld commented 2 years ago

We do not support GPU on windows, it is more for linux server which needs to process hundreds of streams in parallel. You'd better use it with prebuilt docker.

In my experience, compiling vosk with kaldi CUDA is too difficult... :(

nshmyrev commented 2 years ago

We might consider building GPU packages some time in the future but no promises, sorry.

opentld commented 2 years ago

We might consider building GPU packages some time in the future but no promises, sorry.

You have done so much for me, I really appreciate it! Thank you!