dracor-org / dracor-schema

ODD and schemas for dracor.org files
https://dracor.org/doc/odd
5 stars 2 forks source link

Gender values #46

Open ingoboerner opened 1 year ago

ingoboerner commented 1 year ago

Currently, schema allows: MALE, FEMALE, UNKNOWN (we have some spelling variations here: *UNKOWN, *UNKWON); for MALE *MAE In some corpora (Cal but also Swe) there are other values, e.g. DIVERSE and MIXED... which, unlike the spelling errors, might be actually useful. Shall we extend the allowed values?

ingoboerner commented 1 year ago

here is an example of MIXED: https://dracor.org/api/corpora/swe/play/strindberg-till-damaskus DIVERSE see CalDraCor, e.g. { "id": "musicos", "name": "MÚSICOS", "isGroup": true, "sex": "DIVERSE" },

ingoboerner commented 1 year ago

see also NONBINARY here: https://github.com/GOLEM-lab/golem-frontend-api/blob/57c55c6984e681ea2162b6099b02a207c78dda02/schemas.py#L45

lehkost commented 1 year ago

Thanks for collecting these variants! I think the examples above are well covered by "UNKNOWN" in our current use of the term in combination with elements person and personGrp (assuming that UNKNOWN can be anything from MIXED and DIVERSE to actually UNKNOWN – this is far from perfect, but acknowledging that we cannot really annotate MIXED or DIVERSE if we don't have a clear understanding of what this would mean for all of our thousands of plays since antiquity). It is a bit similar to the imperfect annotation of character relations, where e.g. "associated_with" covers such a range of things that it is near unusable for interesting queries. But it's a start and we can always further qualify our data at a later point.

Also, TEI guidelines 4.5.0 introduce a differentiation between sex and gender attributes. In the light of this, we have to find a clear annotation strategy for this kind of data before we make any adjustments in all our corpora. Until then, I would propose to fall back to UNKNOWN in cases like the ones you described.