SpeechColab / GigaSpeech

Large, modern dataset for speech recognition
Apache License 2.0
631 stars 62 forks source link

Can you provide "text_raw" information? #131

Open lifeiteng opened 1 year ago

lifeiteng commented 1 year ago

Can you provide "text_raw" information? text_raw contains richer text information.

截屏2023-07-31 14 10 11
dophist commented 1 year ago

The dataset generation pipeline contains some steps that are not 100% reversible, so currently I'm afraid the answer is no.

alumae commented 9 months ago

I would be also interested in this. Has perhaps anybody tried to produce the reverse normalization? Should be easily doable with some LLM.