cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

Cleaning topic filter #187

Open cessda-bitbucket-importer opened 4 years ago

cessda-bitbucket-importer commented 4 years ago

Original report on BitBucket by Taina Jääskeläinen.


CDC wants to show topic terms also from other vocabularies than the CESSDA Topic Classification in the detailed study view. However, the filter should only include terms from the CESSDA vocabulary.

This can be done by:

1) Vocab name attribute is a mandatory element in metadata if the topic element is used (DDI Profile and validator issue)

2) CDC includes in the filter only those terms that have “CESSDA Topic Classification” as the vocab name (CDC issue) and the validator has validated as correct terms. Ideally, if term not correct, not included in the filter rather than the whole dataset dropped.

Have to discuss if this is possible.

In 2020, CDC should drop those that do not have “CESSDA Topic Classification” as vocab name, or alternatively “CESSDA” included in the vocab name.

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Initial analysis suggests very few records (less than 2k) will be present if only those using classifications.vocab="CESSDA topic classification" are admitted.

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Note that some SPs use classifications.vocab="CESSDA topical classification" in error (SND, NSD)

e.g. https://datacatalogue-dev.cessda.eu/detail?q=%22SND__ext0106%22

e.g. https://datacatalogue-dev.cessda.eu/detail?q=%22NSD__http://nsddata.nsd.uib.no:80/obj/fStudy/NSD2420-1%22

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


@‌TainaFSD Please add issue(s) to https://github.com/cessda/cessda.metadata.officeissues re use of “CESSDA topical classification”

cessda-bitbucket-importer commented 4 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Requires further discussion with User Group

cessda-bitbucket-importer commented 4 years ago

Original comment by Taina Jääskeläinen.


As so few have the CV name in metadata (10%?), we may have to do the cleaning in other way: include only those records into the filter that have a term that is included in the topic classification vocabulary in any language version. Is this possible? Would mean implementing the vocabulary into the system.

cessda-bitbucket-importer commented 3 years ago

Original comment by Taina Jääskeläinen.


This can now be tested, as there is a CVS API? The vocabulary would be implemented and if there is a term in any language from the vocabulary in the metadata, the system would change this into an English term for the filter.

In the filter, the terms would be displayed in the sentence case (only the first letter in capital letter), just as they are now displayed in the detailed study view.

One additional challenge is that the system for finding if the term is included in the metadata would need to ignore the case, i.e. would find the term regardless whether it is all small letters, all capital letters or sentence case in the metadata. That is, case insensitive for the checking if the term is there.

The priority for this task is MUST as otherwise the important topic filter remains a mess.

cessda-bitbucket-importer commented 3 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Needs to be done as a pre-processing task (as part of harvesting process) for performance reasons.

cessda-bitbucket-importer commented 3 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Address in release 2.4

cessda-bitbucket-importer commented 3 years ago

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


Removing myself from being assigned from this issue

cessda-bitbucket-importer commented 2 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Updating version is preferred to setting status to on hold.