ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add scripts compatible wtih the korean parallel corpora #161

Closed klei22 closed 3 months ago

klei22 commented 3 months ago

These scripts also convert from Korean Hangul to Jamo -- see https://github.com/JDongian/python-jamo -- which grabs the phonetic components of Hangul symbols.

After conversion, can use the python3 prepare.py -t input.txt --method char to prepare train.bin and val.bin from the text sample.