NirantK / hindi2vec

State-of-the-Art Language Modeling and Text Classification in Hindi Language
http://nirantk.com/hindi2vec
MIT License
218 stars 27 forks source link

Port notebook from fastai 0.7.x to fastai 1.0.x #8

Open NirantK opened 6 years ago

salilmishra23 commented 5 years ago

Hey @NirantK, I wanted to know can I help you with this issue?

NirantK commented 5 years ago

fastai has changed both the language model API and data block API. It would be great to train a new Hindi language model.

Refer fastai docs here: https://docs.fast.ai/text.learner.html#language_model_learner

The data links for cleaned Wikipedia corpus and other datasets are already in the README of this project.

A good first step might be to setup fastaiv1 on Google Colab and train an English LM model.