ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
24 stars 18 forks source link

Refactor non ipa language targeted words to use [[[[[word]]]]] #297

Closed gkielian closed 1 week ago

gkielian commented 3 weeks ago

Due to limitations of espeak, will not be able to get all words into the target language, thinking we might want to go ahead with just surrounding the words by brackets.

This will allow us to either prune these lines (one line per transcription), or see if there is a pattern we can use to target these for ipa transcription (e.g. if Hanzi or English instead of the Korean -> IPA target, could try adding more manual Hanzi -> Korean IPA transcription).

klei22 commented 1 week ago

Closing PR due to duplicate merged PR.

Please add count_brackets to template/utils