eriq-augustine / jocr

Japanese OCR
GNU General Public License v2.0
9 stars 5 forks source link

Compile Subtitle Corpus #16

Open eriq-augustine opened 9 years ago

eriq-augustine commented 9 years ago

The dataset is in /media/nas/data/crawJapSubs and ~eriq/temp/crawJapSubs. Clean this up more and turn it into a formal corpus.