CopticScriptorium / coptic-nlp

Coptic NLP pipeline page and utilities
Apache License 2.0
14 stars 5 forks source link

Create sentence splitting dataset #35

Closed ctschroeder closed 5 months ago

ctschroeder commented 10 months ago

NB later: Train on original layer? or create two models?

ctschroeder commented 6 months ago

adding note that @LCBM0828 has handed this off to @amir-zeldes

amir-zeldes commented 6 months ago

Wonderful, will take a closer look after the release. At some point we could also consider maintaining a repo for that data, or maybe some scripts to harvest additional reliable sentences from new datasets we release to grow this data.