Closed dnoesgaard closed 2 years ago
It makes sense to me. It would also make it more clear what to do with those citations of species pages.
I would like to add gbifOccurrenceKey: []
as well. For those cases where someone cites a few individual occurrences (for example a taxonomic treatment). If I understand correctly, then we currently only count those on dataset level. Counting on dataset level makes perfect sense for large downloads, but when a paper cites a few individual occurrences, then it would be nice to capture that as the occurrence probably plays a larger role.
// in mendeley
[
"gbifOccurrence:2247859888"
]
would add
// in literature index
"gbifOccurrenceKey": [
"2247859888"
]
I've added gbifOccurrence
tags for a few papers now:
91710ee8-d590-3953-a6e9-4cfdc608e5da 51974777-846f-335a-8d6a-687d85a5714e (Edit by morten: nice example) 24412bdf-599e-3c60-ae9d-1d72d557772c f9ef5a36-cbd8-3a76-a0f9-3d3262070969
gbifTaxon
has already been applied to ~1,500 papers
(for my own sake, here's how easily pull these from ES using wildcards)
% curl --location --request GET 'cms-search.gbif.org:9200/_search' \
--header 'Content-type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "gbifOccurrence*",
"fields": [
"tags"
]
}
}
]
}
}
}'
When indexing the taxonKeys I suggest we also resolve the higher ranks and add those. That would make it possible to search for all papers about e.g. a family (and not just papers about specific species).
taxonKey: [456,789],
allTaxonKeys: [456,789,1,2,3,4,5,6,10,11,12,13,14,15,16] //including the leafs for convinence i suppose (similar to occ index). I'm not sure what a good name for that field is
To summarize, we are suggesting the addition of (at least) three new items to add to the index based on tags in Mendeley:
gbifTaxon -> gbifTaxonKey: [] (+field with higher taxa resolved) gbifOccurrence -> gbifOccurrenceKey: [] gbifFeature -> gbifFeatureId: [] citation_type -> citationType
(perhaps some considerations around nomenclature for these fields is necessary)
Oh, and while we're at it, can we add this one too?
citation_type -> citationType
@dnoesgaard is gbifFeatureId ids of Contentful content? is citation_type a controlled enum/vocabulary that you used?
For clarity, the "gbifFeature" tag contains a Contentful identifier of a related data use case, allowing the linkage of literature to a GBIF feature of that paper. The "citation_type" tag is used to indicate how a literature item cites GBIF data (e.g. DOI, generic), but it's not entirely controlled (but it probably could be).
(perhaps some considerations around nomenclature for these fields is necessary)
citation_type -> citationType
- why is this snake_case and the others camelCase?
gbifFeatureId: is it called feature because it applies to more than just dataUse stories? Could it be any contentful item?
I appear to have used mostly snake_case in Mendeley tags, but obviously the ES index should use whatever we prefer.
"gbifFeatureId" is just a name but the intention is to link to dataUse items only.
For citation_type, I believe it could be controlled using
(the latter being when a paper doesn't cite a DOI but provides one when contacted)
In an attempt to categorize literature by taxon, I've started tagging papers using
gbifTaxon:<taxonKey>
, e.g., b688f91b-8f9f-39e4-a378-6d9375247da8:If we could make the crawler add this as field to the ES index, we could start featuring literature on species pages, etc.
(@MortenHofft, you might also have thoughts on this)