Integrate binding model into stacked_tokenizer - Githubissues

CopticScriptorium / coptic-nlp

Coptic NLP pipeline page and utilities

Apache License 2.0

14 stars 5 forks source link

Integrate binding model into stacked_tokenizer #24

Closed lgessler closed 5 years ago

lgessler commented 5 years ago

Move most binding code into lib/
Move some data files into data/
Integrate new ML-based binding into the detok section of the stacked tokenizer
Comment eval_binding out of eval.py (can fix this if we need to, we just changed the signature of run_eval because of the need for both gold and orig for a train set or test set)

To test the PR:

cd lib
# train the model
python binder.py xgboost --train_list=onno+ephraim+victor+cyrus
python stacked_tokenizer.py ../eval/plain/aug_bind_uddev.txt -d