Z-yq / TensorflowASR

一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1
Apache License 2.0
461 stars 111 forks source link

你好,考虑将silero-vad加入到项目中吗 #49

Open TszSimLaw opened 1 year ago

Z-yq commented 1 year ago

暂时没有用过这个项目,还没想好怎么加入。 后续再规划一下

StuartIanNaylor commented 1 year ago

I am not all that sure about silero-vad as the Number Detector and Language Classifier sort of make it a bit 'fat' for just VAD. Maybe there are simpler and easier ways to chunk spoken audio to fit beam search lengths of incoming realtime audio?

Z-yq haven't looked much but likely a simpler lower parameter model than silero could be used.

Also I think farfield and BSS/Beamforming are likely wireless distributed arrays and ASR central due to the possible diversification of use zonal systems could use.

https://github.com/breizhn/DTLN is a pretty good filter but the dataset needs to be mixed with noise and processed by DTLN or any filter so artefacts are trained in. https://github.com/Rikorose/DeepFilterNet is truly outstanding but more load and a shame the Ladspa plugin uses Tract as a ML framework as its single thread only.