parse_options.sh: No such file or directory

HuangZiliAndy / RPNSD

PyTorch implementation of RPNSD

MIT License

60 stars 15 forks source link

parse_options.sh: No such file or directory #5

Open ooza opened 4 years ago

ooza commented 4 years ago

Thanks for this great job and for making it public. I'm new in this domain and I'm trying to test my own data using your trained model. The steps are as follow: 1- Installing Kaldi and Faster-rcnn. 2- Downloading the modelbest.pth.tar file under RPNSD/model 3- Running ./inference.sh

The output that I got : Experiment directory is experiment/pretraincfgres101epoch1bs8opsgdlr0.01min_lr0.0001schedulermultipat10seed7alpha1.0archres101dev12000modelbestfreeze0bnfix0cfgres101epoch10bs8opsgdlr0.00004min_lr0.00004pat10seed7alpha0.1archres101 Decision threshold is 0.5 NMS threshold is 0.3 Fold 1 Modelname is modelbest scripts/eval_cpu.sh: line 17: parse_options.sh: No such file or directory

Am I missing something? Should I adapt my data first then run the inference?

HuangZiliAndy commented 4 years ago

Hi. Sorry for my mistake, I already fix that. You can simply soft link tools/kaldi/egs/wsj/s5/utils to the main directory. parse_options.sh is one file from utils (this is a file from Kaldi).

And for your step, I think you need to adapt first. Since we are using five-folds cross validation. There will be 5 models after you adapt and then you can do inference.

Thanks!

ooza commented 4 years ago

And for your step, I think you need to adapt first. Since we are using five-folds cross validation. There will be 5 models after you adapt and then you can do inference.

Thanks!

Thanks for you response. I resolved the problem of parse_options.sh file. I'm new on the speaker diarization domain and I have a small corpus of German and French not annotated wav files, can you please explain how should I proceed to fine-tune or directly use your pertained model on my own data?

HuangZiliAndy commented 4 years ago

First make sure that your data is 8kHz telephone data. If it is not, there might be some mismatch (since I am training on 8kHz telephone data) Of course, our model can be applied to 16kHz data as well, but that requires training on 16kHz data.

Second, to make adaptation, you need to prepare a dataset just like the CALLHOME dataset in our example. That requires diarization label. (who speak when) If you have no information at all, that would be difficult. Another option is to directly use my pretrained model, but there might be domain mismatch, so I cannot guarantee the performance. In my experiment, adaptation will improve the performance.