Closed thatbudakguy closed 1 year ago
this might be a better fit for spacy's new SpanCategorizer
. maybe we can use this example project and some smart pre-annotating to bootstrap a model for parsing annotations. we could extract:
AB反
音X
...下同
, ...注同
, 本又作...
, 本亦作...
, [work]作...
如字
云
...或...
, ...又...
X也
and AB之[A|B]
凡三篇正二攝一
卦
?徐
?approach is now outlined in docs/pipeline.md; closing in favor of more specific tickets.
annotations seem to have a reliable internal structure which might lend itself well to dependency parsing. perhaps we can define custom terms (for fanqie, qualifiers, citations, etc.) and then use a dependency parser to automatically parse out the interesting parts of each annotation.