alumae / kaldi-offline-transcriber

Offline transcription system for Estonian using Kaldi
Other
226 stars 57 forks source link

No rule to make target 'build/output/intervjuu201306211256.txt'. Stop #8

Closed vince62s closed 8 years ago

vince62s commented 8 years ago

I built the system exacty as in the readme.

make .init seems fine. then I download the demo audio make build/output/intervjuu201306211256.txt gives me the error in the subject line ....

Something wrong ?

alumae commented 8 years ago

Where did you place the demo audio file?

vince62s commented 8 years ago

in src-audio

alumae commented 8 years ago

What happens when you make build/audio/base/output/intervjuu201306211256.wav?

alumae commented 8 years ago

Sorry, make build/audio/base/intervjuu201306211256.wav?

vince62s commented 8 years ago

sox fail formats: no handler for file extension mp3 I must be missing something....

alumae commented 8 years ago

Try installing the package 'libsox-fmt-all' (on Debian) or similar.

vince62s commented 8 years ago

ok now the latter command converts sox src-audio/intervjuu201306211256.mp3 -c 1 build/audio/base/intervjuu201306211256.wav rate -v 16k seems ok. but the original command still returns the same error

vince62s commented 8 years ago

Hi, sorry to bother, still stuck , is there something else I can check to see what is wrong ? in the meantime I will rebuild Kaldi and KOT well no luck, same issue.

alumae commented 8 years ago

Sorry for leaving you alone. Have you modified the Makefile? Do you have anything in Makefile.options?

What happens when you run:

make build/trans/intervjuu201306211256/nnet2_online_ivector_pruned_rescored_main/decode/log
vince62s commented 8 years ago

well, initially I wanted to use it with other models, so the only things I changed is the lines related to compounder that I commented out. 1 line LM_Compounder=.... in .lang I commented the last section for compounder lastly the build/fst/data/compounder ... section.

the makefile.options contain only the KALDI_ROOT to the kaldi folder.

I am running the make build/trans....in your comment above. A bunch of log / info scrolled and now It's decoding I see some estonian text in the decode.1.log file

alumae commented 8 years ago

Well, I cannot support you if you commented out some sections. Estonian ASR is using compound-split units (compound words, such as 'handshake', are split to 'hand' and 'shake' in the lexicon). The final output needs the compound-split units to be recompounded, and it uses a special LM to do that. If you comment out sections that describe how to get a recompounded output from compound-split output, then the Makefile doesn't know how to produce the final output.

vince62s commented 8 years ago

to avoid any doubt I will rebuild leaving the sections in. But I don't understand. Even in Estonian we could decide to skip the compound/recompound, which I agree would lead to some WER, but it's just a step and if the rest remains consistent....

EDIT: I owe you a beer. apologies. So I need to go under the hood to understand how to skip the compound section.

EDIT2: I finally found the rest of the lines that need to be commented out to skip compounding. It works fine now without compounding.

Just to be clear : Punctuation is not available yet, is it ?