direct-phonology / jdsw

Parsing the "Jingdian Shiwen" with spaCy
MIT License
2 stars 0 forks source link

parse annotations using a model #32

Closed thatbudakguy closed 1 year ago

thatbudakguy commented 1 year ago

annotations seem to have a reliable internal structure which might lend itself well to dependency parsing. perhaps we can define custom terms (for fanqie, qualifiers, citations, etc.) and then use a dependency parser to automatically parse out the interesting parts of each annotation.

thatbudakguy commented 1 year ago

this might be a better fit for spacy's new SpanCategorizer. maybe we can use this example project and some smart pre-annotating to bootstrap a model for parsing annotations. we could extract:

thatbudakguy commented 1 year ago

approach is now outlined in docs/pipeline.md; closing in favor of more specific tickets.