SSHOC / marketplace-curation

Project to manage scripts and auxiliary data, via Python library and Jupyter notebooks, for the curation of the SSH Open Marketplace
0 stars 0 forks source link

investigating "merge" cases of the keyword vocab #17

Open laureD19 opened 6 months ago

laureD19 commented 6 months ago

see https://docs.google.com/spreadsheets/d/1-Oh9_SxIhfMAT6KNJrMf4LetCpy5s1fHZEyTL__TUVA/edit#gid=883102613

create and describe the workflow needed to merge two duplicate/close concepts in the keyword vocabulary, keeping all items previously using the duplicated concepts tagged with the newly created/merged concept

The workflow can be described/documented in the tutorial for moderators gDoc: https://docs.google.com/document/d/1xKnwr5uWJW4fFXpsTUuBG0gKIh0bfxJ_GcFmxILzp84/edit#heading=h.oi21o6xwesj2

carikan commented 6 months ago

We can understand the workflow by going through cases.

-if there are two keywords in singular and plural forms, prefer the singular, -always prefer the simple version of a term if the term is followed by extra words in brackets. In the example below prefer the first.

Platform-independent Platform-independent (Windows and generic installers available) Platform-independent (java) Platform-independent (requires Python)

I tended to keep the abbreviations because they are widely used. But we can discuss it.

If a keyword exist in a vocabulary then similar related keywords (including the same keyword starting with should be also mapped to it. Example: Annotating =>https://vocabs.dariah.eu/tadirah/en/page/annotating similar terms, annotation, annotating are marked as "merge", they are mapped to the same vocabulary.

If there are keywords in adjective and noun forms, then prefer the noun. Example: bibliographic, bibliography

If there are two keywords, one in minuscule and the other starts with a capital, prefer the keyword starting with a capital letter.

Many keywords contains dash "-" between words. If there is the same keyword without dash, it should be preferred.

carikan commented 5 months ago

Go through the merged keywords and map to delete the old- not used anymore keywords.