gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Interpretation of sex, lifeStage from dynamicProperties (VertNet) #478

Open MattBlissett opened 3 years ago

MattBlissett commented 3 years ago

Migrated from #272.

This is not about how they are queried, but how they are extracted from the dynamicProperties term value.

Nik:

hasSex: es term sex exists (If DwcTerm.sex is empty -> parse DwcTerm.dymanicProperties -> apply regular DwcTerm.sex parser)

hasLifeStage: es term lifeStage exists (If DwcTerm.lifeStage is empty -> parse DwcTerm.dymanicProperties -> apply regular DwcTerm.lifeStage parser)

Explanation:

The implementation under development in #477 uses the interpreted sex and lifeStage fields that already exist in GBIF's occurrence index. If the Darwin Core term value for sex is empty, then a parser (migrated from VertNet's code) extracts the value from dynamicProperties and feeds it through the normal GBIF SexParser (i.e. this dictionary).

If the Darwin Core lifeStage value is empty, then a copied-from-VertNet parser looks in dynamicProperties for a value, and feeds that through the normal GBIF LifeStage parser (i.e. this vocabulary).

tucotuco commented 3 years ago

Just a note for future possibilities. It is also possible to get lifeStage information by parsing reproductiveCondition (e.g., "lactating" in reproductiveCondition signifies "adult" in lifeStage), but we didn't get to implementing that in VertNet.