Closed AcculusSasao closed 7 months ago
ailia.audio.mel_spectrogramにfminを設定した場合に計算精度の問題があるため、ailia SDK 1.2.17が必要。 ailia SDK 1.2.17のリリース後にマージ予定。
ailia_audio 1.3.0 にて、mel spectogram のfmin/fmax問題が修正され、誤差がほぼ無いことを確認。
サンプルの実行
オリジナル
$ python clap.py -e 0
INFO arg_utils.py (13) : Start!
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
INFO arg_utils.py (163) : env_id: 0
INFO arg_utils.py (166) : CPU
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
===== cosine similality between text and audio =====
audio: input.wav
cossim=0.1508, word=applause applaud clap
cossim=0.2941, word=The crowd is clapping.
cossim=0.0392, word=I love the contrastive learning
cossim=0.0754, word=bell
cossim=-0.0924, word=soccer
cossim=0.0310, word=open the door.
cossim=0.0850, word=applause
cossim=0.4185, word=dog
cossim=0.3813, word=dog barking
INFO clap.py (179) : Script finished successfully.
修正前 ailia_audio
$ python clap.py -e 0 --ailia_audio
INFO arg_utils.py (13) : Start!
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
INFO arg_utils.py (163) : env_id: 0
INFO arg_utils.py (166) : CPU
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
===== cosine similality between text and audio =====
audio: input.wav
cossim=0.1517, word=applause applaud clap
cossim=0.3049, word=The crowd is clapping.
cossim=0.0366, word=I love the contrastive learning
cossim=0.0805, word=bell
cossim=-0.0872, word=soccer
cossim=0.0468, word=open the door.
cossim=0.0886, word=applause
cossim=0.4182, word=dog
cossim=0.3826, word=dog barking
INFO clap.py (179) : Script finished successfully.
修正後 ailia_audio
$ python clap.py -e 0 --ailia_audio
INFO arg_utils.py (13) : Start!
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
INFO arg_utils.py (163) : env_id: 0
INFO arg_utils.py (166) : CPU
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
INFO model_utils.py (86) : ONNX file and Prototxt file are prepared!
===== cosine similality between text and audio =====
audio: input.wav
cossim=0.1508, word=applause applaud clap
cossim=0.2941, word=The crowd is clapping.
cossim=0.0392, word=I love the contrastive learning
cossim=0.0754, word=bell
cossim=-0.0924, word=soccer
cossim=0.0310, word=open the door.
cossim=0.0850, word=applause
cossim=0.4185, word=dog
cossim=0.3813, word=dog barking
INFO clap.py (179) : Script finished successfully.
issue #1318