direct-phonology / jdsw

Parsing the "Jingdian Shiwen" with spaCy
MIT License
2 stars 0 forks source link

find named entities in annotations #35

Closed thatbudakguy closed 11 months ago

thatbudakguy commented 1 year ago

this task isn't a strict prerequisite, but it will improve the tokenization for #32.

some text or person names we can preselect/annotate and merge, since we know they'll likely always refer to named entities. the giveaway for most of these is that they precede a 云.

texts

people

GDRom commented 1 year ago

A general comment to the above list here:

Also, not sure if the following occur verbatim, but LDM mentions the following texts/scholars in his preface/序 as his source material (list not complete):

(alternative versions of same ID separated by " | ")

Texts

People (all of them may be indicated by their family name, which I believe is the first character for all of the ones below)

thatbudakguy commented 11 months ago

the spancat model now tags entities, so this is done.