SpeechColab / GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
Apache License 2.0
118 stars 6 forks source link

Release the original transcript? #7

Closed XqFeng-Josie closed 2 months ago

XqFeng-Josie commented 2 months ago

I have noticed that all transcripts of audios are post-processed, so that all the texts are uppered. I want to know if you can release the original text with punction, that would be more helpful! thank you for your team contribution.

yfyeung commented 2 months ago

I have noticed that all transcripts of audios are post-processed, so that all the texts are uppered. I want to know if you can release the original text with punction, that would be more helpful! thank you for your team contribution.

Thank you for your feedback. Unfortunately, the original text with punctuation and case was lost during the forced alignment process.