This is a lm project for hiwi.
├── data (data is not uploaded in GitHub)
│ ├── tasaTrain.txt
│ ├── tasaTest.txt
│ └── *.txt
├── lm_lib
│ ├── math.py
│ ├── process_df.py
│ ├── read.py
│ └── text.py (tasa Class)
├── lstm_model
│ ├── get_file_list.py
│ ├── find_context.py
│ ├── eval.py
│ ├── prepare_sequences.py
│ ├── tokenization.py
│ ├── train.py
│ └── production.py
├── nltk_model(obsolete)
│ ├── prepare_sents.py
│ ├── train.py
│ └── production.py
├── transformer_model(under_construction)
├── lstm_output
│ └── something
├── nltk_output
│ └── something
├── trained (store intermidiate trained model and files)
│ └── something
├── README.md
├── requirements.txt
└── .gitignore
pip3 install -r requirements.txt
python3 train_test_split.py
in the data dirpython3 prepare_sequences.py
python3 tokenization.py
python3 train.py
mkdir lstm_output
python3 production.py
(by default tasaTest.txt is needed in data folder)python3 get_file_list.py
python3 find_context.py
python3 eval.py
pip3 install -r requirements.txt
python3 prepare_sents.py
python3 train.py
mkdir nltk_output
python3 production.py
(by default tasaTest.txt is needed in data folder)