gbif / portal16

GBIF.org website
https://www.gbif.org
Apache License 2.0
24 stars 15 forks source link

Enhancements to the literature search #401

Closed timrobertson100 closed 7 years ago

timrobertson100 commented 7 years ago

The literature search should be enhanced in 3 ways:

1. Bug in country of coverage

The country of coverage is being populated incorrectly. The Mendeley response contains many country things (e.g. US, United_States etc). The country of coverage should only be populated from values using a case insensitive format of <iso_code>_biodiversity. E.g. us_biodiversity, DK_biodiversity. The "_biodiversity" is the convention that controls the country of coverage versus the other versions used for things like country of authorship.

2. New elastic search field of topic and facet on site

A new multivalued ES index field of topic should be created. It should be populated only with keywords from the Mendeley API that match the controlled vocabulary that @dnoesgaard will provide in a comment below (e.g. Agriculture, Human health etc. Once in ES, the "Topic" facet should appear on the left in the site.

3. New elastic search field of relevance and facet on site

A new multivalued ES index field of relevance should be created. It should be populated only with keywords from the Mendeley API that match the controlled vocabulary that @dnoesgaard will provide in a comment below (e.g. GBIF_used, GBIF_mentioned etc. Once in ES, the "Relevance" facet should appear on the left in the site.

dnoesgaard commented 7 years ago

Thank you, @timrobertson100. I've been meaning to create this issue, but never got around to it.

Re. 2 these are the topic values:

Agriculture Biodiversity_science Biogeography Citizen_science Climate_change Conservation Data_management Data_paper Ecology Ecosystem_services Evolution Freshwater Human_health Invasives Marine Phylogenetics Species_distributions Taxonomy

Re. 3, these are the relevance values:

GBIF_used GBIF_cited GBIF_discussed GBIF_primary GBIF_acknowledged GBIF_published GBIF_author GBIF_mentioned GBIF_funded

Now, I understand that if we wait for the perfect solution, we'll never get anything done, BUT, if it makes any sense for you guys to have the two new ES fields populated in the same fashion as we do for gbifDOI, please let me know. I can easily bulk retag the Mendeley papers accordingly, e.g.

gbifTopic:Biogeography gbifRelevance:GBIF_used

I do have ideas for additional fields, but those can wait.

MortenHofft commented 7 years ago

closed by https://github.com/gbif/portal16/commit/d66616808d7d71b19ffb8c73498aab8c2298e070 and https://github.com/gbif/portal16/commit/2c94a3467a90bc757c927ad1b9e6d7b845f1a6c0

MortenHofft commented 7 years ago

topics enum need to be normalized across indeices and in contentful. The migration script to populate contentful could perhaps be updated to add the enums suggested above and used in the literature index? The contentful ES index could then use those values instead

timrobertson100 commented 7 years ago

If it is added to Contentful I'd suggest adding a new field to the vocabulary objects and not using the term/title. It would mean users be subjected to things like INVASIVE_SPECIES in contentful drop downs etc. You would do it here