ga4gh-beacon / beacon-v2

Unified repository for the GA4GH Beacon v2 API standard
Creative Commons Zero v1.0 Universal
27 stars 22 forks source link

Suggestion: redefine individuals:sex #38

Open joerivandervelde opened 2 years ago

joerivandervelde commented 2 years ago

Suggestion for potential improvement. Within the FAIR Genomes project (https://www.nature.com/articles/s41597-022-01265-x) there have been many discussions on a Dutch national level on how to best represent this type of information. The NCIT terms are, quite frankly, vague and thus not very useful (i.e. female = "[..] indicate biological sex distinctions, or cultural gender role distinctions, or both"). In the end, we chose to represent what Beacon v2 calls ‘sex’ as 'GenderAtBirth' using GSSO terms (https://github.com/fairgenomes/fairgenomes-semantic-model/blob/main/lookups/GenderAtBirth.txt) with separate terms for 'GenderIdentity' (https://github.com/fairgenomes/fairgenomes-semantic-model/blob/main/lookups/GenderIdentity.txt) and 'GenotypicSex' (in Beacon v2 as ‘KaryotypicSex’, https://github.com/fairgenomes/fairgenomes-semantic-model/blob/main/lookups/GenotypicSex.txt) to complete the full picture.

mbaudis commented 2 years ago

@joerivandervelde I agree mostly about NCIT in that respect; we had on initiative from GA4GH Metadata years ago introduced "genotypic sex" in PATO (e.g. PATO:0020002: female genetic sex). Still I think this would be a sensible option.

In the current implementation we now have tried to accommodate Phenopackets with its "clinical geneticists do it like this" representation of karyotypicSex (so this is a required parameter...) and the vague Sex - which is not explicitly made equal to the (phenotypic) Phenopackets Sex.

IMO we need a clear GenotypicSex parameter which does not refer to a specific karyotype measurement (see the PATO definition which I prefer for general data analysis). This is also what you get in most cases. Or change the suggested terms for sex back to the PATO ones.

mshadbolt commented 2 years ago

I find the terminology around sex/gender in the model a bit confusing too.I agree that a clear genotypic/biological or assigned sex at birth parameter would be clearer as I am hesitant to use the 'karyotypicSex' field as it implies that a formal karyotype was done, and even if a person is assigned a sex at birth, it doesn't necessarily mean that it was based on a karyotype, e.g. an XXY individual may be assigned 'male' at birth, and not find out until later that the karyotype is different from a strict XYmale.

I am also confused as to why at the cohort level, the language is around 'genders' e.g. cohort.collectionEvents.eventGenders and cohort.inclusion|exclusionCriteria.genders. Is this equivalent to the sex fields at the individual level or is it referring to gender identity, which can differ from genotypic/karyotypic/assigned sex?

mbaudis commented 2 years ago

I am hesitant to use the 'karyotypicSex' field as it implies that a formal karyotype was done

Yes. But having this as an optional field for this purpose & to be in line w/ Phenopackets is +1. But it is mostly to accommodate geneticists' praxis (I guess sometimes assumed even if not done?).

I am also confused as to why at the cohort level, the language is around 'genders' ...

Yes.