Open TszSimLaw opened 1 year ago
I am not all that sure about silero-vad as the Number Detector and Language Classifier sort of make it a bit 'fat' for just VAD. Maybe there are simpler and easier ways to chunk spoken audio to fit beam search lengths of incoming realtime audio?
Z-yq haven't looked much but likely a simpler lower parameter model than silero could be used.
Also I think farfield and BSS/Beamforming are likely wireless distributed arrays and ASR central due to the possible diversification of use zonal systems could use.
https://github.com/breizhn/DTLN is a pretty good filter but the dataset needs to be mixed with noise and processed by DTLN or any filter so artefacts are trained in. https://github.com/Rikorose/DeepFilterNet is truly outstanding but more load and a shame the Ladspa plugin uses Tract as a ML framework as its single thread only.
暂时没有用过这个项目,还没想好怎么加入。 后续再规划一下