ckan / ideas

[DEPRECATED] Use the main CKAN repo Discussions instead:
https://github.com/ckan/ckan/discussions
40 stars 2 forks source link

Tags - case sensitivity #181

Open Aaron-M opened 8 years ago

Aaron-M commented 8 years ago

When a user enters tags for a dataset, they may begin typing and then select an existing tag for autocompletion, or they may simply keep typing. Data may also be deposited via the api and so autocompletion of tags is not utilised. This leads to situations where you have the same tag being treated as different dependant on the case used when entered.

E.g. I have datsets tagged with 'flora', and others tagged with 'Flora' - and for the purposes of filtering by tag they are treated as different (the tags list at the left of the screen in the 'Data' view (pic below) shows them as separate tags).

It would be nice if CKAN could either a) be case insensitive to tags, or b) be configured (ini setting) to standardise tags as say either lower or sentence case and convert tags on the fly to conform when they are entered.

This could tie in with #68 and ultimately a tag management UI for cleaning up tags (e.g. managing variants of the same tag such as plurals or misspellings.... and fixing in bulk rather than 1 by 1.

image

torfsen commented 8 years ago

I'd also love to have a better tag management system! In particular case-invariance and synonym management would be really nice to have (as you already pointed out).

Ideally, the original tags specified for a dataset would not be changed directly but would be transformed by the tag management system before they are displayed or indexed. That would allow, for example, to remove a synonym for a tag from the system.

Perhaps we could simply introduce an appropriate extension interface for this "tag-middleware". An extension could then, for example, lower-case all tags or provide tag synonyms.

nibecker commented 8 years ago

Two very good ideas (case insensitivity and synonym management). While the first seems to be right for every use case and therefore should be included in the core, the latter (synonym management) is definitely useful to some but not all users and therefore would make a great extension. Maybe this could be based on/make use of Fraunhofer's extension for the EDP handling multilingual tags.