你好，考虑将silero-vad加入到项目中吗

I am not all that sure about silero-vad as the Number Detector and Language Classifier sort of make it a bit 'fat' for just VAD. Maybe there are simpler and easier ways to chunk spoken audio to fit beam search lengths of incoming realtime audio?

Z-yq haven't looked much but likely a simpler lower parameter model than silero could be used.

Also I think farfield and BSS/Beamforming are likely wireless distributed arrays and ASR central due to the possible diversification of use zonal systems could use.

https://github.com/breizhn/DTLN is a pretty good filter but the dataset needs to be mixed with noise and processed by DTLN or any filter so artefacts are trained in. https://github.com/Rikorose/DeepFilterNet is truly outstanding but more load and a shame the Ladspa plugin uses Tract as a ML framework as its single thread only.

Z-yq / TensorflowASR

你好，考虑将silero-vad加入到项目中吗 #49