gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
6 stars 1 forks source link

'In use', 'deprecated', 'testing' labels or indications on vocabularies #155

Open CecSve opened 1 month ago

CecSve commented 1 month ago

We occasionally received feedback from people on vocabularies that are not yet used in pipelines. Could we add a feature (maybe a stop signal; green, yellow, red) for the different vocabularies to indicate whether it is in use, deprecated or not active, and testing so it is more clear to users whether the vocabularies are in use?

marcos-lg commented 1 month ago

We can add tags to vocabularies(right now they are for concepts only). The tags also allow to choose a color if that helps.

CecSve commented 1 month ago

Thanks that might be a good option. But should we then have a vocabulary for vocabulary status?

marcos-lg commented 1 month ago

If we want to have a more controlled way to handle the status maybe it's not a good idea to use tags.

For the deprecated status the api already supports deprecating a vocabulary. They can be listed as this:

https://api.gbif.org/v1/vocabularies?deprecated=true

And we have these fields to give info about the deprecation:

replacedByKey;
deprecated;
deprecatedBy;

We have an example in dev although it wasn't replaced by any other vocab(therefore the replacedByKey is not shown):

https://api.gbif-dev.org/v1/vocabularies?deprecated=true

Also, for the in use status it might mean different things for different vocabularies. For example, some vocabularies are used in the pipelines data interpretation but others are used only in the registry like the grscicoll ones and others might not be used in any system at all.

Another important thing is that in the pipelines data interpretation we use the latest released version of a vocabulary, which means that if there are changes in the vocabulary but haven't been released they aren't being used(even though the vocabulary will be in in use status). The api allows to list the vocabularies that have unreleased changes although we can't see what the changes are:

https://api.gbif.org/v1/vocabularies?hasUnreleasedChanges=true

it also allows to query the latest release of a vocabulary:

https://api.gbif.org/v1/vocabularies/LifeStage/concepts/latestRelease

So instead of tags we can add these fields to the vocabulary to be more explicit about the status:

"usage": "pipelines data interpretation", // this can be an enumeration
"status": "in use" // we'll define some status so it's not a free-text field

Usage should be set when the vocabulary is created and the status should be updated manually although some can be set automatically. For example, a vocabulary is not being used if it hasn't been released at least once.

We can also add this read-only fields if it helps:

"released": "true",
"hasUnreleasedChanges": "false"

An improvement to that would be to also show the changes that are unreleased for the vocabularies that have unreleased changes. We talked briefly about this here: https://github.com/gbif/vocabulary/issues/132

We'll also have to show this properly in the UI.

What do you think @CecSve ?

EDIT: status could also be a boolean called active since we already have a status for deprecated

CecSve commented 1 month ago

Thanks, that makes sense. So would the values would be controlled and documented for all the fields (usage, status, released, hasUnreleasedChanges)? Why would status have boolean values, though? I would assume we would have at least three different values: active, not active and deprecated.

marcos-lg commented 1 month ago

Why would status have boolean values, though? I would assume we would have at least three different values: active, not active and deprecated.

Because we already have a field for deprecated, therefore it's only 2 possible status.

CecSve commented 1 month ago

Why would status have boolean values, though? I would assume we would have at least three different values: active, not active and deprecated.

Because we already have a field for deprecated, therefore it's only 2 possible status.

Ok, so status would be in use, released = true, and hasUnreleasedChanges= true for newly released vocabularies that are not yet used in production?

marcos-lg commented 1 month ago

Ok, so status would be in use, released = true, and hasUnreleasedChanges = true for newly released vocabularies that are not yet used in production?

Nope, I imagined it like this (renaming status to active so it's a boolean):

When it's first released but not used in production yet:

active: false
released: true
hasUnreleasedChanges: false
deprecated: null

when it's used in production:

active: true
released: true
hasUnreleasedChanges: false
deprecated: null

if at some point after the first release someone does changes to the vocab but it didn't release it the hasUnreleasedChanges flag changes:

active: true
released: true
hasUnreleasedChanges: true
deprecated: null

if we deprecate the vocab:

active: false
released: true
hasUnreleasedChanges: false
deprecated: 020-03-31T12:41:10.914+00:00  // we use the date as we do in the deleted field of other entities such as dataset
CecSve commented 1 month ago

Thank you. That makes sense!

marcos-lg commented 1 week ago

I'm gonna leave this issue on pause because since we'd have to update some of the fields manually(the active one for example) I think it's better to handle this in the documentation in our docs site.