gbif / model-tests

Exploration of sample models
2 stars 0 forks source link

Vocabularies #25

Closed MortenHofft closed 2 years ago

MortenHofft commented 2 years ago

@tucotuco I suspect this is down the line, but I miss having vocabularies.

The model is very flexible. Which is nice of course. But it also makes machine consumption difficult without vocabularies for types (eventType, entityType, entityIdentifierType, assertion types). The alternative is more explicit fields (e.g. lifeStage). Currently a lot of the data we see in structured DarwinCore fields would have to go into assertions (e.g. fieldNumber, sex) as well as other fields like weight, tail length etc.

I see that there is a protocol on assertions, but as I understand the word that is something else than a vocabulary?

A concrete example is that I would like to show the location of the specimen in the wild where it was gathered. I can guess that it is probably the eventType collection that I will have to look at, but that obviously needs to be the consistent naming for all publishers. Another is knowing which entities to add to my index of specimens (I probably want to exclude the image of the jacket) Another is to allow searches by weight (+ possibly protocol). But it would help a lot to know the vocabulary that the data should align to during ingestion (instead of trying to match everything to all vocabularies and hope that you find a good match).

I realise that my use of vocabulary isn't very clear here, but I suspect you get the meaning

tucotuco commented 2 years ago

@tucotuco I suspect this is down the line, but I miss having vocabularies.

The model is very flexible. Which is nice of course. But it also makes machine consumption difficult without vocabularies for types (eventType, entityType, entityIdentifierType, assertion types). The alternative is more explicit fields (e.g. lifeStage). Currently a lot of the data we see in structured DarwinCore fields would have to go into assertions (e.g. fieldNumber, sex) as well as other fields like weight, tail length etc.

You are absolutely right in that the power will come not only from the conceptual model, but also from the vocabularies that allow people to find and things and to understand what they find. Both @Jegelewicz nd dustymc have emphasized this a couple of times. I alluded to this in the presentation, but I don't think I made the message strong enough. I only called it an opportunity.

Yes, many of the Darwin Core Occurrence terms end up as Assertions on the class in the model they are properties of. For those that are Darwin Core terms, we have those in a vocabulary already (Darwin Core), so that makes me worry less. The good (and scary) thing is everything else people will want to add that hasn't had the kind of community development that Darwin Core has had. For this, maybe a good way forward is to keep developing Darwin Core as the bag of terms it is with the view to those terms being the stable ones, the ones you can definitely rely on for searching, just as we do now.

I see that there is a protocol on assertions, but as I understand the word that is something else than a vocabulary?

Yes, I think a vocabulary would be too constraining. I see it filling a role much like samplingProtocol, georeferenceProtocol or measurementMethod do in Darwin Core.

A concrete example is that I would like to show the location of the specimen in the wild where it was gathered. I can guess that it is probably the eventType collection that I will have to look at, but that obviously needs to be the consistent naming for all publishers.

Yes!

Another is knowing which entities to add to my index of specimens (I probably want to exclude the image of the jacket) Another is to allow searches by weight (+ possibly protocol). But it would help a lot to know the vocabulary that the data should align to during ingestion (instead of trying to match everything to all vocabularies and hope that you find a good match).

Yes!

I realise that my use of vocabulary isn't very clear here, but I suspect you get the meaning

I believe I do. If my responses suggest otherwise, let me know.

MortenHofft commented 2 years ago

Assertions and vocabularies also bring up translations and term ordering. Known terms can be teased out, translated and structured in meaningful groups. But the unknown will almost have to just live in their own table as a bag of attributes. Even if they might be related to something like the DwC-organism group. But if communities come together and agree on a set of new terms (or they are added to dwc as you mention), then that opens up the possiblity that we can start translating them and group related terms.