Center-for-Digital-Narrative / elmcip

Electronic Literature as a Model of Creativity and Innovation in Practice (ELMCIP) is a collaborative research project funded by Humanities in the European Research Area (HERA) JRP for Creativity and Innovation built in Drupal
https://elmcip.net
4 stars 0 forks source link

OAI-PMH - So what about "Other language" taxonomy term?--Implement New Langs #159

Open steinmb opened 9 years ago

steinmb commented 9 years ago

Followup issue of Install and configure CELL OAI-PMH module #112

Some of our content are tagged with the taxonomy term http://elmcip.net/category/language/other-indicate-description. These are a bit hard to translate to a ISO 639-1 and https://www.ietf.org/rfc/rfc3066.txt

dc:language>en</dc:language>
<dc:language>fr</dc:language>
<dc:language>it</dc:language>
<dc:language/>
elmcip commented 9 years ago

Question -- does this break an export or something? I was just going through some of the records that have this -- many are using it incorrectly (for instance if a work uses "visual language" or "alembic writing" -- but it is there to flag if we need to add a language that is not on the list -- For example this work http://elmcip.net/creative-work/commedia

is in Croatian. We should add Croatian to the list.

I guess the other fix would be to just have the whole ISO 639-1 list or some more complete partial list available? How do other databases tend to handle this?

Another workaround would be to eliminate other language from the list but have some kind of contact link in the info text under the field so people could ask us to add a language for a record.

elmcip commented 9 years ago

Notes for reference -- records where this field is used as intended show that we need to have Croatian, Serbo-Croatian, Welsh.

Not sure how we handle works like this: http://elmcip.net/creative-work/i-love-you that use many languages. Not necessarily useful to single them out. Should this just be "none" with indication in the description and tags that it is multilingual?

I've cleaned out records that were using "other language" as a stand-in for "complicated use of language" -- http://elmcip.net/category/language/other-indicate-description -- the remaining records either use a language that is not on the list or are multilingual.

AlvaroSeica commented 8 years ago

Pasting CELL list:

Achinese Acoli Adangme Adyghe (Adygei) Afro-Asiatic languages Afrihili Afrikaans Ainu Akan Akkadian Albanian Aleut Algonquian languages Southern Altai Amharic Old English (ca.450-1100) Angika Apache languages Arabic Official Aramaic (Imperial Aramaic) Aragonese Armenian Mapudungun (Mapuche) Arapaho Artificial languages Arawak Assamese Asturian (Bable - Leonese - Asturleonese) Athapascan languages Australian languages Avaric Avestan Awadhi Aymara Azerbaijani Banda languages Bamileke languages Bashkir Baluchi Bambara Balinese Basque Basa Baltic languages Beja (Bedawiyet) Belarusian Bemba Bengali Berber languages Bhojpuri Bihari languages Bikol Bini (Edo) Bislama Siksika Bantu languages Tibetan Bosnian Braj Breton Batak languages Buriat Buginese Bulgarian Burmese Blin (Bilin) Caddo Central American Indian languages Galibi Carib Catalan (Valencian) Caucasian languages Cebuano Celtic languages Czech Chamorro Chibcha Chechen Chagatai Chinese Chuukese Mari Chinook jargon Choctaw Chipewyan (Dene Suline) Cherokee Church Slavic (Old Slavonic - Church Slavonic - Old Bulgarian - Old Church Slavonic) Chuvash Cheyenne Chamic languages Coptic Cornish Corsican Creoles and pidgins (English based) Creoles and pidgins (French-based) Creoles and pidgins (Portuguese-based) Cree Crimean Tatar (Crimean Turkish) Creoles and pidgins Kashubian Cushitic languages Welsh Czech Dakota Danish Dargwa Land Dayak languages Delaware Slave (Athapascan) German Dogrib Dinka Divehi (Dhivehi - Maldivian) Dogri Dravidian languages Lower Sorbian Duala Middle Dutch (ca.1050-1350) Flemish Dutch Dyula Dzongkha Efik Egyptian (Ancient) Ekajuk Modern Greek (1453-) Elamite English Middle English (1100-1500) Esperanto Estonian Basque Ewe Ewondo Fang Faroese Persian Fanti Fijian Filipino (Pilipino) Finnish Finno-Ugrian languages Fon French French Middle French (ca.1400-1600) Old French (842-ca.1400) Northern Frisian Eastern Frisian Western Frisian Fulah Friulian Ga Gayo Gbaya Germanic languages Georgian German Geez Gilbertese Gaelic (Scottish Gaelic) Irish Galician Manx Middle High German (ca.1050-1500) Old High German (ca.750-1050) Gondi Gorontalo Gothic Grebo Ancient Greek (to 1453) Modern Greek (1453-) Guarani Swiss German (Alemannic - Alsatian) Gujarati Gwich'in Haida Haitian (Haitian Creole) Hausa Hawaiian Hebrew Herero Hiligaynon Himachali languages (Western Pahari languages) Hindi Hittite Hmong (Mong) Hiri Motu Croatian Upper Sorbian Hungarian Hupa Armenian Iban Igbo Icelandic Ido Sichuan Yi (Nuosu) Ijo languages Inuktitut Interlingue (Occidental) Iloko Interlingua (International Auxiliary Language Association) Indic languages Indonesian Indo-European languages Ingush Inupiaq Iranian languages Iroquoian languages Icelandic Italian Javanese Lojban Japanese Judeo-Persian Judeo-Arabic Kara-Kalpak Kabyle Kachin (Jingpho) Kalaallisut (Greenlandic) Kamba Kannada Karen languages Kashmiri Georgian Kanuri Kawi Kazakh Kabardian Khasi Khoisan languages Central Khmer Khotanese (Sakan) Kikuyu (Gikuyu) Kinyarwanda Kirghiz (Kyrgyz) Kimbundu Konkani Komi Kongo Korean Kosraean Kpelle Karachay-Balkar Karelian Kru languages Kurukh Kuanyama (Kwanyama) Kumyk Kurdish Kutenai Ladino Lahnda Lamba Lao Latin Latvian Lezghian Limburgan (Limburger - Limburgish) Lingala Lithuanian Mongo Lozi Luxembourgish (Letzeburgesch) Luba-Lulua Luba-Katanga Ganda Luiseno Lunda Luo (Kenya and Tanzania) Lushai Macedonian Madurese Magahi Marshallese Maithili Makasar Malayalam Mandingo Maori Austronesian languages Marathi Masai Malay Moksha Mandar Mende Middle Irish (900-1200) Mi'kmaq (Micmac) Minangkabau Uncoded languages Macedonian Mon-Khmer languages Malagasy Maltese Manchu Manipuri Manobo languages Mohawk Mongolian Mossi Maori Malay Multiple languages Munda languages Creek Mirandese Marwari Burmese Mayan languages Erzya Nahuatl languages North American Indian languages Neapolitan Nauru Navajo (Navaho) South Ndebele North Ndebele Ndonga Low German (Low Saxon) Nepali Nepal Bhasa (Newari) Nias Niger-Kordofanian languages Niuean Dutch (Flemish) Norwegian Nynorsk Norwegian Bokmål Nogai Old Norse Norwegian N'Ko Pedi (Sepedi - Northern Sotho) Nubian languages Classical Newari (Old Newari - Classical Nepal Bhasa) Chichewa (Chewa - Nyanja) Nyamwezi Nyankole Nyoro Nzima Occitan (post 1500) Ojibwa Oriya Oromo Osage Ossetian (Ossetic) Ottoman Turkish (1500-1928) Otomian languages Papuan languages Pangasinan Pahlavi Pampanga (Kapampangan) Panjabi (Punjabi) Papiamento Palauan Old Persian (ca.600-400 B.C.) Persian Philippine languages Phoenician Pali Polish Pohnpeian Portuguese Prakrit languages Old Provençal (to 1500) (Old Occitan (to 1500)) Pushto (Pashto) Reserved for local use Quechua Rajasthani Rapanui Rarotongan (Cook Islands Maori) Romance languages Romansh Romany Romanian (Moldavian - Moldovan) Rundi Aromanian (Arumanian - Macedo-Romanian) Russian Sandawe Sango Yakut South American Indian languages Salishan languages Samaritan Aramaic Sanskrit Sasak Santali Sicilian Scots Selkup Semitic languages Old Irish (to 900) Sign Languages Shan Sidamo Sinhala (Sinhalese) Siouan languages Sino-Tibetan languages Slavic languages Slovak Slovak Slovenian Southern Sami Northern Sami Sami languages Lule Sami Inari Sami Samoan Skolt Sami Shona Sindhi Soninke Sogdian Somali Songhai languages Southern Sotho Spanish (Castilian) Albanian Sardinian Sranan Tongo Serbian Serer Nilo-Saharan languages Swati Sukuma Sundanese Susu Sumerian Swahili Swedish Classical Syriac Syriac Tahitian Tai languages Tamil Tatar Telugu Timne Tereno Tetum Tajik Tagalog Thai Tibetan Tigre Tigrinya Tiv Tokelau Klingon (tlhIngan-Hol) Tlingit Tamashek Tonga (Nyasa) Tonga (Tonga Islands) Tok Pisin Tsimshian Tswana Tsonga Turkmen Tumbuka Tupi languages Turkish Altaic languages Tuvalu Twi Tuvinian Udmurt Ugaritic Uighur (Uyghur) Ukrainian Umbundu Undetermined Urdu Uzbek Vai Venda Vietnamese Volapük Votic Wakashan languages Wolaitta (Wolaytta) Waray Washo Welsh Sorbian languages Walloon Wolof Kalmyk (Oirat) Xhosa Yao Yapese Yiddish Yoruba Yupik languages Zapotec Blissymbols (Blissymbolics - Bliss) Zenaga Standard Moroccan Tamazight Zhuang Chuang Chinese Zazaki Zande languages Zulu Zuni No linguistic content Not applicable Code Source Language Other

elmcip commented 7 years ago

Hannah should be able to tackle this when she has time. After we are done with the help text and field stuff.

steinmb commented 7 years ago

Needs to added to /admin/structure/taxonomy/language

steinmb commented 7 years ago

After they have been added, these nodes https://elmcip.net/category/language/other-indicate-description need to get the right lang. defined.

steinmb commented 7 years ago

@hannahackermans do you know when you have time to address this?

steinmb commented 7 years ago

I need to add every language in Alvaro's list to https://elmcip.net/admin/structure/taxonomy/language? (unless it is already in there)

Yes. We need to expand the list we currently have.

hannahackermans commented 7 years ago

I added the list. I will keep myself assigned for now because there are two more tasks in this issue I'll do later are (I will see when I can do this, but right now I'm running late for an editorial board meeting I'm supposed to lead.): 1) The languages are in alphabetical order now, which is generally good, but there are exceptions. i.e. Old Irish makes more sense to have at the I underneath Irish if we want people to find it. I will change the order of these. 2) I will check the works currently in "other" and assign the right language.

hannahackermans commented 7 years ago

I will check the works currently in "other" and assign the right language. The Other section is now empty, I assigned the right languages or created a new record for other language editions to the best of my abilities. In some cases "none" or "multiple languages" were the only option.

The languages are in alphabetical order now, which is generally good, but there are exceptions. i.e. Old Irish makes more sense to have at the I underneath Irish if we want people to find it. I will change the order of these.

I have changed my mind about this one because I think it might be more useful to change the languages to make them appear in alphabetical order. Otherwise, we need to put any new language to follow in alphabetical order manually. So instead I propose to change for example "Old Irish" to "Irish, Old". Is that okay? This would mean that our languages will not be the same as the CELL Project, is that a problem?

One new issue that came up while cleaning the "other" section is that we now have so many languages that it is a little impractical to have such a small box in edit. I think it would help if we could make the language box 8 instead of 4 languages long. (I will create a new issue for this as well)

steinmb commented 7 years ago

Suggestion I

Yes, the list to pick from is getting quite long. Perhaps we should consider changing selection widget? If we change it to, example autocomplete, we could perhaps fix both issues? If we do this though. All content types, language widgets and help text, should be changed. Perhaps except for the original language selector that is would like to keep as it is since it is used as a binary switch to control the text field visibility. But that probably needs to be tested.

Autocomplete widget

fullscreen_19_06_2017__10_45

his would mean that our languages will not be the same as the CELL Project, is that a problem?

I think it would be a issue. CELL probably uses the name to try to match the lang. There is no other unique ID to use. ELMCIP on the other hand have a machine name (tid - term identifier), but it is only unique within our installation and do probably not make sense to the outside world harvesting our data. Here I might be wrong. Let us here if @elmcip have something to add.

Suggestion II

Another way to do this is to have another field on the lang. taxonomy term that we could sort by for those "special" one.

steinmb commented 7 years ago

BTW, my comment above is probably better suited in issue #331 - We could update that issue better reflect the scope of work