Open thatbudakguy opened 2 years ago
This sounds like a brilliant solution when compared to our previous approach. And you are right, this makes things for a human reader much clearer, as the structure you are proposing inherently draws attention to what's unclear.
Also, just to follow up on LDM, as this logically would result in either of the two things:
MCInitial=[dr/tr]
, is ambiguousMCRime=jang|MCTone=X
right now there are a variety of cases where
Reconstruction
can raise aMultipleReadingsError
: when fetching an initial, a rime, or an entire reading. for at least some of these cases, I think we could do something a little smarter. an example:drjangH
,drjang
,trjangX
.dr
andtr
.jang
.there's still ambiguity here, but much less ambiguity than simply giving up and not assigning a reading! if we can come up with a systematic way of noting the ambiguity, as B&S do for their OC reconstruction (using things like brackets), we might still salvage some information that would help an algorithm or a human manually correcting the data. for example:
[dr|tr]jang[X|H|_]
and if we annotate each part in a separate field, this might make it into the CoNLL-U as:
MCInitial=[dr/tr]|MCRime=jang|MCTone=[X/H/_]
(using the / instead of | since that character is reserved to separate annotations in CoNLL-U
MISC
andFEATS
fields.)this also helps in the (unfortunately many) cases where LDM did provide an annotation, but one or both of the characters in his fanqie happen to be polyphones.