Open fmatter opened 1 year ago
Hm, the most transparent practice I've seen in this regard is using ellipsis …
(ideally the Unicode character U+2026, and not three dots ...
) in both, Analyzed_Word
and Gloss
. Admittedly, this is also often used very inconsistently - leaving out the ellipsis in the Gloss, etc. But from my point of view, recommending this practice would also raise awareness of the fact that the ellipsis is part of the example, and must be considered for consistency.
That's a very reasonable solution, works for me.
Should None
in tab-delimited columns raise a validation error?
Should
None
in tab-delimited columns raise a validation error?
Yes, I would say so. After all, one of the main reasons for using ellipsis for unglossed words is that we get lists of str
for both aligned properties.
Yes, I would say so. After all, one of the main reasons for using ellipsis for unglossed words is that we get lists of
str
for both aligned properties.
Maybe we could keep some sort of backwards compatibility (with somewhat undefined bahaviour) by converting None
to ellipsis upon reading.
Quite often, people will not gloss words like person or place names or unparsable words, so some words may only be present in
Primary_Text
, but not inAnalyzed_Word
orGloss
.The most transparent way to store an example like that in CLDF is to have an empty list item in these two columns:
Primary_Text
:"x y Person z"
Analyzed_Word
:"x\ty\t\tz"
(["x","y",None,"z"]
once read by pycldf)Gloss
:"xg\tyg\t\tzg"
(["xg","yg",None,"zg"]
)This passes validation, but for example
cldf createdb
does not work (TypeError: sequence item 1: expected str instance, NoneType found
) and I've been doing things likeex["Analyzed_Word"] = ["" if x is None else x for x in ex["Analyzed_Word"]]
ininitializedb.py
scripts.Should empty items in a gloss column raise an error upon validation? If yes, is the way to handle unglossed words to simply leave them out? (i.e.
"x\ty\tz"
["x","y","z"]
)? Or, if empty items are allowed, would it be OK for pycldf to yield""
instead ofNone
(i.e."x\ty\t\tz"
["x","y","","z"]
)?