Open joerivandervelde opened 2 years ago
@joerivandervelde I agree mostly about NCIT in that respect; we had on initiative from GA4GH Metadata years ago introduced "genotypic sex" in PATO (e.g. PATO:0020002: female genetic sex). Still I think this would be a sensible option.
In the current implementation we now have tried to accommodate Phenopackets with its "clinical geneticists do it like this" representation of karyotypicSex
(so this is a required parameter...) and the vague Sex
- which is not explicitly made equal to the (phenotypic) Phenopackets Sex
.
IMO we need a clear GenotypicSex
parameter which does not refer to a specific karyotype measurement (see the PATO definition which I prefer for general data analysis). This is also what you get in most cases. Or change the suggested terms for sex back to the PATO ones.
I find the terminology around sex/gender in the model a bit confusing too.I agree that a clear genotypic/biological or assigned sex at birth parameter would be clearer as I am hesitant to use the 'karyotypicSex' field as it implies that a formal karyotype was done, and even if a person is assigned a sex at birth, it doesn't necessarily mean that it was based on a karyotype, e.g. an XXY
individual may be assigned 'male' at birth, and not find out until later that the karyotype is different from a strict XY
male.
I am also confused as to why at the cohort level, the language is around 'genders' e.g. cohort.collectionEvents.eventGenders
and cohort.inclusion|exclusionCriteria.genders
. Is this equivalent to the sex fields at the individual level or is it referring to gender identity, which can differ from genotypic/karyotypic/assigned sex?
I am hesitant to use the 'karyotypicSex' field as it implies that a formal karyotype was done
Yes. But having this as an optional field for this purpose & to be in line w/ Phenopackets is +1. But it is mostly to accommodate geneticists' praxis (I guess sometimes assumed even if not done?).
I am also confused as to why at the cohort level, the language is around 'genders' ...
Yes.
Suggestion for potential improvement. Within the FAIR Genomes project (https://www.nature.com/articles/s41597-022-01265-x) there have been many discussions on a Dutch national level on how to best represent this type of information. The NCIT terms are, quite frankly, vague and thus not very useful (i.e. female = "[..] indicate biological sex distinctions, or cultural gender role distinctions, or both"). In the end, we chose to represent what Beacon v2 calls ‘sex’ as 'GenderAtBirth' using GSSO terms (https://github.com/fairgenomes/fairgenomes-semantic-model/blob/main/lookups/GenderAtBirth.txt) with separate terms for 'GenderIdentity' (https://github.com/fairgenomes/fairgenomes-semantic-model/blob/main/lookups/GenderIdentity.txt) and 'GenotypicSex' (in Beacon v2 as ‘KaryotypicSex’, https://github.com/fairgenomes/fairgenomes-semantic-model/blob/main/lookups/GenotypicSex.txt) to complete the full picture.