IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
881 stars 492 forks source link

Add duplicate title, subject, and description fields for text in multiple languages #4633

Open amberleahey opened 6 years ago

amberleahey commented 6 years ago

Having fields like title and description offer alternative language versions would also be helpful for discovery.

From Julian: from what I can tell so far, DataCite 3.1 schema lets you specify the language of Title, Subject and Description with a long attribute (4.1 adds the xml lang attribute to Rights) - https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf. The schema says it accepts only IETF BCP 47 and ISO 639-1 language codes. But I don't think Dataverse knows the ISO language codes for the languages it displays in the Citation block (I vaguely remember a comment about this in a github issue or maybe a Google Group post but can't find it). The Consorcio Madroño Dataverse does this with the DataCite metadata they publish for each dataset: https://edatos.consorciomadrono.es/api/datasets/export?exporter=oai_datacite&persistentId=doi%3A10.21950/O53TLR

And most or all of the DDI elements that Dataverse uses can include a lang attribute (http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/xml_xsd/attributes/lang.html). Looks like it accepts any value for now.

see related ticket https://github.com/IQSS/dataverse/issues/4632 for adding record language qualifier

pdurbin commented 6 years ago

@amberleahey I'm not sure if you caught this during the talk by @pengchengluo at the Dataverse Community Meeting the other week (slides at https://schd.ws/hosted_files/dataversecommunitymeeting20/eb/Slides%20-%20Support%20University%20Students%E2%80%99%20Data%20Driven%20Research%20in%20a%20National%20Contest%20with%20PKU%20Open%20Research%20Data%20Platform--v0.3.pdf ), but a while back he implemented the ability to enter metadata in multiple languages. Here are English and Chinese screenshots from http://opendata.pku.edu.cn/dataset.xhtml?persistentId=doi:10.18170/DVN/CX1SM6 for example:

screen shot 2018-06-27 at 11 21 36 pm screen shot 2018-06-27 at 11 21 41 pm
pengchengluo commented 6 years ago

Hi, @pdurbin and @amberleahey, do you know about CERIF which is a European standard on Current Research Information Systems and supported by several commerical and opensource CRIS, for example Elsevier Pure .

CERIF supports multiple languages. In its database model, the attributes of some entity have a cfLangCode field. For example, for the Project entity, it has cfProjTitle attribute and this cfProjTitle table has a cfLangCode field. image

(https://www.eurocris.org/Uploads/Web%20pages/CERIF-1.6/documentation/MImage.html)

In our Peking University implementation, we added additional field for dataverse to store Chinese metadata and added additinal metadata blocks for dataset. We didn't support Chinese and English metadata for data file. So, we also hope harvard dataverse itself can support multiple language metadata for dataverse, dataset and datafile coherently, although it seems need to change a lot and multiple language information retrieval also is a challenge.

pdurbin commented 6 years ago

@pengchengluo no, I wasn't aware of that standard. @scolapasta and @JayanthyChengan check this out. ^^

cmbz commented 1 month ago

2024/09/16: Reviewed, will investigation further.