Closed xiaobobo-bilibili closed 2 years ago
Most of it comes from anchor.fm and Indonesian TV YouTube Channel. I use ASR to create all of the transcripts.
anchor.fm
And may I ask which ASR engine did you use? Is that like third-party online service (Azure, Google Cloud, AWS) or an engine you built? (I'm just curious and trying to evaluate the confidence of your amazing data, not interested in the legal side of issues)
Also, how is the segmentation conducted? Did you use a custom VAD module or did you conduct the segmentation using timestamps in subtitle?
By unsupervised I mean this one . Can you tell me where did you find these audios and corresponding transcripts?