cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

Parse en-GB language codes #219

Closed cessda-bitbucket-importer closed 2 years ago

cessda-bitbucket-importer commented 4 years ago

Original report on BitBucket by Taina Jääskeläinen.


As some countries use language and country code combinations, can CDC read only the first two digits in the xml:lang attributes and ignore the rest? The first two-digit part is ISO 639-1.

Easier than make SPs change their legacy metadata. EQB also has two elements where this combination is needed, to distinguish between questions asked in the UK and in Australia, for instance.

See also #230.

cessda-bitbucket-importer commented 4 years ago

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


I'll take a look.

cessda-bitbucket-importer commented 4 years ago

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


@‌TainaFSD Do you have an example of a record where this is the case?

cessda-bitbucket-importer commented 4 years ago

Original comment by Taina Jääskeläinen.


I think that ADP uses this format.

cessda-bitbucket-importer commented 3 years ago

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


Given that ADP's OAI-PMH endpoint is not responding, and I can't find another example to work with, I'm putting this on hold for now.

2020-11-30 12:25:51.691 ERROR (LocalHarvesterConsumerService.java:68) - [ADP] ListRecordHeaders failed: eu.cessda.pasc.oci.exception.XMLParseException: Parsing https://www.adp.fdv.uni-lj.si/v0/oai?verb=ListIdentifiers&metadataPrefix=oai_ddi25 failed: eu.cessda.pasc.oci.exception.HTTPException: Server returned 503

cessda-bitbucket-importer commented 2 years ago

Original comment by Taina Jääskeläinen.


Check what ADP OAI-PMH currently has for language.

cessda-bitbucket-importer commented 2 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


@matthew-morris-cessda ADP's endpoint is responding. See https://www.adp.fdv.uni-lj.si/v0/oai?verb=ListIdentifiers&metadataPrefix=oai_ddi25

cessda-bitbucket-importer commented 2 years ago

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


I’ve decided to strip language codes like en-GB to remove all characters after the dash. This results in en-GB being transformed to en.

cessda-bitbucket-importer commented 2 years ago

Original comment by Taina Jääskeläinen.


Issue #334 was marked as a duplicate of this issue.

cessda-bitbucket-importer commented 2 years ago

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


Fixed in <link to pull request removed>.