LM

This is a lm project for hiwi.

Structures

├── data (data is not uploaded in GitHub)
│   ├── tasaTrain.txt
│   ├── tasaTest.txt
│   └── *.txt
├── lm_lib
│   ├── math.py
│   ├── process_df.py
│   ├── read.py
│   └── text.py (tasa Class)
├── lstm_model
│   ├── get_file_list.py
│   ├── find_context.py
│   ├── eval.py
│   ├── prepare_sequences.py
│   ├── tokenization.py
│   ├── train.py
│   └── production.py
├── nltk_model(obsolete)
│   ├── prepare_sents.py
│   ├── train.py
│   └── production.py
├── transformer_model(under_construction)
├── lstm_output
│   └── something
├── nltk_output
│   └── something
├── trained (store intermidiate trained model and files)
│   └── something
├── README.md
├── requirements.txt
└── .gitignore

LSTM Instructions

Train LSTM model with tasa Corpus

install all dependencies pip3 install -r requirements.txt
create data folder, transfer data (by default tasaTrain.txt is needed) and split the data with python3 train_test_split.py in the data dir
run python3 prepare_sequences.py
run python3 tokenization.py
run python3 train.py
create output folder mkdir lstm_output
run python3 production.py (by default tasaTest.txt is needed in data folder)
outputs are in lstm_output folder
Eval model
run python3 get_file_list.py
run python3 find_context.py
run python3 eval.py

NLTK Tnstructions (obsolete)

install all dependencies pip3 install -r requirements.txt
create data folder and transfer data (by default tasaTrain.txt is needed)
run python3 prepare_sents.py
run python3 train.py
create output folder mkdir nltk_output
run python3 production.py (by default tasaTest.txt is needed in data folder)
outputs are in nltk_output folder

MegamindHenry / lm

readme

LM

Structures

LSTM Instructions

Train LSTM model with tasa Corpus

Eval model

NLTK Tnstructions (obsolete)