dzhw / metadatamanagement

Metadatamanagement (MDM) - Data Search for Higher Education Research and Science Studies
https://metadata.fdz.dzhw.eu
GNU Affero General Public License v3.0
25 stars 9 forks source link

As a data provider I would like to have a controlled vocabulary as "search as you type suggestions" while entering tags #2613

Open AndyDaniel1 opened 4 years ago

AndyDaniel1 commented 4 years ago

We need to evaluate ways to include a controlled vocabulary (a thesaurus) in the mdm to standardize tags.

Idea: Propose tags from a thesaurus as "search as you type suggestions"

UteH commented 4 years ago

e. g. https://elsst.ukdataservice.ac.uk/ https://elsst.ukdataservice.ac.uk/thesaurus-search/view-concept/?id=f4395aa9-0ea8-4f6a-bd93-c83f79864f91&lang=EN#/tab-current-version

investigation:

rreitmann commented 4 years ago

We need to analyze what we can import from ukdataservice. Source of tags shall be harmonized for data packages and concepts.

rreitmann commented 3 years ago

We need to get in contact with ukdataservice in order to use elsst within our application: https://elsst.ukdataservice.ac.uk/elsst-guide/obtaining-elsst.aspx

rreitmann commented 3 years ago

I got the following reply when asking for a license and technical details:

Thank you for your enquiry regarding the ELSST thesaurus. We are very glad to hear of your interest in the product. As we are in the process of moving its ownership from the University of Essex to the Consortium of European Social Science Data Archives (CESSDA), we are no longer able to issue a licence. We shall let you know, however, as soon as the new licence arrangements are in place. The CESSDA licence will be a Creative Commons licence.

AndyDaniel1 commented 3 years ago

@anneweber mentioned the Neps "Konzeptbaum" as good example you can see it here: https://www.neps-data.de/Datenzentrum/%C3%9Cbersichten-und-Hilfen/NEPSplorer

image

rreitmann commented 3 years ago

We will continue discussing this issue as soon as CESSDA gets in touch with us...

AndyDaniel1 commented 3 years ago

@AndyDaniel1 will contact Gesis (Germany CESSDA Partner)

rreitmann commented 3 years ago

We are pleased to announce that the European Language Social Science Thesaurus (ELSST) is now freely accessible and available on the CESSDA Platform at https://elsst.cessda.eu/ as one of a suite of web-based tools. It is now covered by a CC-BY-SA 4.0 licence. We hope you will find it useful for your purposes.

UteH commented 3 years ago

whooooop whoooooooooooop 🎉

UteH commented 3 years ago

Whats the situation here again 🙈 ?


https://thesauri.cessda.eu/swagger-ui/index.html 👀

rreitmann commented 3 years ago

We need to check the APIs for tags...

rreitmann commented 3 years ago

The following endpoint allows searching for concepts (with paging and wildcards): https://thesauri.cessda.eu/swagger-ui/index.html#/Vocabulary-specific%20methods/get__vocid__search

vocid=elsst lang=de or lang=en query=aber*

The API is completely public thus we could easily integrate it.

@AndyDaniel1 Please check if the thesaurus (https://thesauri.cessda.eu/elsst/en/) is a good (exclusive) source for our tags in German and English from your point of view.

rreitmann commented 3 years ago

There are some other open topics we need to discuss:

  1. Is CC BY-SA 4.0 compatible with our License (AGPL 3)
  2. Which labels do we want to use as our tags (only prefLabel or altLabel as well)?
AndyDaniel1 commented 3 years ago

Just a quick note to keep this in mind: The API of the VerbundFDB accepts only one specific controlled vocabulary (possibly the keyword list of FIS education: https://www.fachportal-paedagogik.de/literatur/schlagwortregister.html). If we want to deliver keywords to VerbundFDB, we need to use this list for our keywords OR map our keywords to the controlled list.

AndyDaniel1 commented 1 year ago

We should get back to this. da|ra uses (obviously) the STW and THESOZ API: grafik

AndyDaniel1 commented 1 year ago

In addition, we should also think about broader classifications: grafik

AndyDaniel1 commented 1 year ago

https://old.datahub.io/dataset/gesis-thesoz

https://lod.gesis.org/thesoz/de/thesoz/de

https://thesauri.cessda.eu/swagger-ui/index.html#/Vocabulary-specific%20methods/get__vocid__search

tilovillwock commented 10 months ago

The following tasks are necessary to complete this feature:

AndyDaniel1 commented 10 months ago

Use the TheSoz Thesaurus: https://lod.gesis.org/thesoz/de/thesoz/de

The new feature should be implemented in the interfaces of data packages and concepts

tilovillwock commented 10 months ago

Use the TheSoz Thesaurus: https://lod.gesis.org/thesoz/de/thesoz/de

A cursory search of the website did not reveal a well defined API or a download of the entire dataset. From what I can tell it should be possible to parse the HTML document and extract the REST URLs for each individual entry. This could prove to be error-prone though. Maybe we simply extract the strings from the index column to the left?

Bildschirmfoto 2023-10-23 um 16 59 11

@AndyDaniel1

AndyDaniel1 commented 10 months ago

Check https://thesauri.cessda.eu/swagger-ui/index.html#/Vocabulary-specific%20methods/get__vocid__search

tilovillwock commented 10 months ago

The Swagger documentation for the other thesaurus resource was helpful but it still took some trial and error. As far as I can tell it's conceivable to implement a script that scrubs the entire vocabulary by using the following URLs:

# index groups A-Z
curl https://lod.gesis.org/rest/v1/thesoz/index/?lang=de
# index group entries
curl https://lod.gesis.org/rest/v1/thesoz/index/${group}?lang=de

We should scrub both the default prefLabel as well as the optional altLabel if present. Since this is a one-off task we should investigate if we can make this a dynamically loadable Angular module unless the size is prohibitive. As a fallback we could simply make this an Elasticsearch Index.

tilovillwock commented 10 months ago

We are going to introduce a dedicated thesoz tag input field that autocompletes to entries that also include the Reference ID from the catalog (e.g. concept_10037561 leads to http://lod.gesis.org/thesoz/concept_10037561).

AndyDaniel1 commented 9 months ago

I got new input concerning the TheSoz and the mapping between TheSoz and ELSST is much worse than expected. Therefore we should implement an ELSST Keyword suggestion for the time being.

AndyDaniel1 commented 7 months ago

If Issue is completed proceed with: https://github.com/dzhw/metadatamanagement/issues/3307

AndyDaniel1 commented 1 month ago

grafik grafik

Add an info-i for ELSST stating:

de: "Die Schlagwörter, auf die hier verwiesen werden kann, stammen aus dem European Language Social Science Thesaurus (ELSST) - CESSDA and Service Providers (2023) The European Language Social Science Thesaurus (ELSST) (Version 4). ELSST ist ein breit angelegter, mehrsprachiger Thesaurus für die Sozialwissenschaften. Er ist Eigentum des Consortium of European Social Science Data Archives (CESSDA) und seiner nationalen Dienstleister und wird von diesen herausgegeben. Der Thesaurus besteht aus über 3.300 Konzepten und deckt die wichtigsten sozialwissenschaftlichen Disziplinen ab: Politik, Soziologie, Wirtschaft, Bildung, Recht, Kriminalität, Demografie, Gesundheit, Beschäftigung, Informations- und Kommunikationstechnologie sowie Umweltwissenschaften. ELSST ist unter einer Creative Commons Attribution-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-sa/4.0/) lizenziert. Weitere Informationen: https://elsst.cessda.eu/

en: "The tags that can be referenced here are derived from the European Language Social Science Thesaurus (ELSST) - CESSDA and Service Providers (2023) The European Language Social Science Thesaurus (ELSST) (Version 4). ELSST is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers. The thesaurus consists of over 3,300 concepts and covers the core social science disciplines: politics, sociology, economics, education, law, crime, demography, health, employment, information and communication technology, and environmental science. ELSST is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-sa/4.0/) . For more information: https://elsst.cessda.eu/"