Closed matyaskopp closed 1 year ago
With my limited knowledge it seems patronymic is more like a surname than like a forename or addName.
I leave it up to you to decide but, yes, I can easily add the type attribute to the three elements.
In Ukrainian, patronymic is part of one's personal name, but it is neither a forename nor a surname. Grammatically, patronymics take the form and function of the adjective, whereas forenames take the form and function of the noun. The latter is also true about most surnames, although some of them are substantivized. However, not all ethnic Ukrainians have patronymics as part of their legal personal name. A few of our ministers were born in the USA or elsewhere and have no patronymic but can have a middle name or a second forename. In a nutshell, I believe the addName element will work for Ukrainian best. Also, I am ok with not differentiating between patronymics and middle names in this category, if it is easier for @matyaskopp .
In Bulgarian patronymic name is a separate name. In form it is closed to surname - possessive adjective from the name of the parent (in some cases from the name of the mather) and similarly for the family name ( in almost all cases from the name of the one of the grandparents). Now we encode them as a surname element and distinguish them be the order - the first surname element is the patronymic name. If the order is not good for this purpose that we could use some attribute. Bulgarian citizens from not-Bulgarian origin could have just two name
In our view
The trouble with ParlaMint schema is that we do not require any particular name parts order, and in the samples (https://clarin-eric.github.io/ParlaMint/#sec-speakers), there is a mixture of orders (surname
first/last position). So it needs to be clarified how the name should be ordered. So I think we should
type
attribute to allow specifying the type of name (patronym
/ married
/ religious
)As for the patronymic, I don't have a strong opinion about which surname
/addName
we should use. If we use the type
attribute, we can probably admit both cases
we should require some order
I disagree - having specified the type of the name part, there is no necessity to impose an order in XML. It is the job of the rendering software to determine how it should be displayed, i.e. forename first or last.
add type attribute to allow specifying the type of name (patronym / married / religious)
This is not a problem, is the list of above 3 values final? I'd add them only to surname though (see below).
As for the patronymic, I don't have a strong opinion about which surname/addName we should use. If we use the type attribute, we can probably admit both cases
For this, I'd be happier if we decide, because we shouldn't have 2 way of representing the same data. I vote for surname.
I also think that the last option is better.
Kiril
-------Original Message------- From: Tomaž Erjavec @.> To: clarin-eric/ParlaMint @.> Cc: KirilSimov @.>, Mention @.> Subject: Re: [clarin-eric/ParlaMint] Patronymic names (Issue #581) Sent: 19 Jan '23 22:56
we should require some order
I disagree - having specified the type of the name part, there is no necessity to impose an order in XML. It is the job of the rendering software to determine how it should be displayed, i.e. forename first or last.
add type attribute to allow specifying the type of name (patronym / married / religious)
This is not a problem, is the list of above 3 values final? I'd add them only to surname though (see below).
As for the patronymic, I don't have a strong opinion about which surname/addName we should use. If we use the type attribute, we can probably admit both cases
For this, I'd be happier if we decide, because we shouldn't have 2 way of representing the same data. I vote for surname.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
I disagree - having specified the type of the name part, there is no necessity to impose an order in XML. It is the job of the rendering software to determine how it should be displayed, i.e. forename first or last.
I am not sure if I agree. The examples in our documentation show <surname><forename>
order and do not mention anything about that the names of the same type (eg surname
) should be sorted as they are commonly used _patronymic lastsurname. So it is possible to have this:
<persName>
<surname>LAST_SURNAME</surname>
<forename>FORENAME</forename>
<surname>PATRONYMIC</surname>
</persName>
The order in my example is used in Ukrainian transcriptions: LAST_SURNAME F.P.
Rendering software will not be able to determine the right order of the names. Forename first will be: FORENAME LAST_SURNAME PATRONYMIC
if stable sort is used for forenames and surnames sorting.
This is not a problem, is the list of above 3 values final? I'd add them only to surname though (see below).
I am not sure if it is final (religious
should be forename). We can start with surname typeing:
patronym
(both for patronymic and matronymic names)married
for name after marriagebirth
For this, I'd be happier if we decide, because we shouldn't have 2 way of representing the same data. I vote for surname
ok if we add type
Hi All,
The closest tip I see in the TEI is this:
Franklin Delano Roosevelt
In our data we used twice surname instead of twice forename. From my point of view it is more appropriate since both names are possessive adjectives. The same should hold for all Slavic languages:
Росен Асенов Плевнелиев
I suggest we use this option and leave it like this. Otherwise, attributes should be used. But I think we should be as simple and as comprehensible as possible.
The examples in our documentation show
order and do not mention anything about that the names of the same type
Yes, but this should change now, so there will not be any names of the same type anymore.
I am not sure if it is final (religious should be forename).
What should "religious" forename be? Like "Father" for priests? Or "St.", in we have saints in ParlaMint? :) Because these could be roleName, if we need them.
We can start with surname typeing: patronym (both for patronymic and matronymic names), married for name after marriage, birth
I am only unsure about "birth", as that is the default anyway, so not sure if we should add it at all.
Dear All,
From my point of view it is more appropriate since both names are possessive adjectives. The same should hold for all Slavic languages
For the sake of accuracy, it does not hold for Ukrainian. Morphologically, Ukrainian patronymics derive from male forenames and take the unproductive suffix -ovych (masculine) or -ivna (feminine), e.g. Ivanovych / Ivanivna. However, they are not homonymous to contemporary possessive adjectives derived from forenames, e.g. Ivaniv syn (Ivan's son), Ivanova dochka (Ivan's daughter), which are not used as patronymics in Ukrainian. As for Ukrainian surnames, their derivational patterns are quite diverse. In short, they can be derived from nouns, adjectives, verbs or even verbal phrases. They can also be homonymous to common nouns (e.g. Vovk), adjectives (e.g. Lysyi) and sometimes indeed to patronymics (e.g. Sydorovych).
Can annotating patronymics as surnames complicate search procedures when the data is mounted on a concordancer?
I do believe that adding type might be useful. Esp. wrt those speakers who change their surname in the midst of a term due to marriage/divorce. Or is there a better way to differentiate between old and current names? Also, a few of our now former ministers were born in the USA. They have middle names and no patronymics. Shall we use forename twice in the latter cases?
пт, 20 січ. 2023 р. о 11:23 Petya Osenova @.***> пише:
Hi All,
The closest tip I see in the TEI is this: Franklin Delano Roosevelt
In our data we used twice surname instead of twice forename. From my point of view it is more appropriate since both names are possessive adjectives. The same should hold for all Slavic languages: Росен Асенов Плевнелиев
I suggest we use this option and leave it like this. Otherwise, attributes should be used. But I think we should be as simple and as comprehensible as possible.
— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/581#issuecomment-1398186457, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3OOYEPLMHSGADUAGFZ5DC3WTJRTRANCNFSM6AAAAAAT4BYTSQ . You are receiving this because you were mentioned.Message ID: @.***>
Thanks, Anna! Then it would not hurt if there are two or more approaches, and these to be reflected in the documentation.
Also, a few of our now former ministers were born in the USA. They have middle names and no patronymics. Shall we use forename twice in the latter cases?
This is the way I saw they do it.
Of course, Tomaz and Matyas might have better ideas.
Good point about documentation, Petya! Looking forward to contributing to it, once final decisions are made and implemented.
I've now added @type
to surname and forename - I discoverend the TEI examples, so I followed the naming scheme there. So, e.g. U.S. middle names should be <forename type="middle">
. And maybe this is where @matyaskopp got "religious" from although note that there this type is used on persName.
But don't forget that we have temporal attributes on persName, so if somebody changes their name due to e.g. marriage, they should get two names, first one marked, with with @to
and second with @from
.
The order of name (patronym surname, forename
) in vert files is not good for Ukrainian, should be surname, forename patronym
I think the solution is replacing https://github.com/clarin-eric/ParlaMint/blob/9d8ef3805162765fd20282275a65c1a3742a0fcb/Scripts/parlamint-lib.xsl#L300-L340
with:
<!-- Format the name of a person from persName -->
<xsl:function name="et:format-name">
<xsl:param name="persName"/>
<xsl:choose>
<xsl:when test="$persName/tei:forename[normalize-space(.)] or $persName/tei:surname[normalize-space(.)]">
<xsl:value-of select="normalize-space(
string-join(
(
string-join($persName/tei:surname[not(@type='patronym')]/normalize-space(.),' '),
concat(
string-join($persName/tei:forename/normalize-space(.),' '),
'',' ',
string-join($persName/tei:surname[@type='patronym']/normalize-space(.),' ')
)
)[normalize-space(.)],
', ' ))"/>
</xsl:when>
<xsl:when test="$persName/tei:term">
<xsl:value-of select="concat('@', $persName/tei:term, '@')"/>
</xsl:when>
<xsl:when test="normalize-space($persName)">
<xsl:value-of select="$persName"/>
</xsl:when>
<xsl:otherwise>
<xsl:message select="concat('ERROR: empty persName for ', $persName/@xml:id)"/>
<xsl:text>-</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
It also fixes this bug (testing forename existence, replacing nonexisting surname) https://github.com/clarin-eric/ParlaMint/blob/9d8ef3805162765fd20282275a65c1a3742a0fcb/Scripts/parlamint-lib.xsl#L326-L328
@TomazErjavec I am not sure about the phase of conversion, can this be included?
now I see the script needs a bit of tuning, missing values can break it...
I think this should work. It isn't easy to test because it needs to see all the data...
<!-- Format the name of a person from persName -->
<xsl:function name="et:format-name">
<xsl:param name="persName"/>
<xsl:choose>
<xsl:when test="$persName/tei:forename[normalize-space(.)] or $persName/tei:surname[normalize-space(.)]">
<xsl:value-of select="normalize-space(
string-join(
(
string-join(
('',$persName/tei:surname[not(@type='patronym')]/normalize-space(.)),
' '),
concat(
string-join(
('',$persName/tei:forename/normalize-space(.)),
' '),
'',' ',
string-join(
('',$persName/tei:surname[@type='patronym']/normalize-space(.)),
' ')
)
)[normalize-space(.)],
', ' ))"/>
</xsl:when>
<xsl:when test="$persName/tei:term">
<xsl:value-of select="concat('@', $persName/tei:term, '@')"/>
</xsl:when>
<xsl:when test="normalize-space($persName)">
<xsl:value-of select="$persName"/>
</xsl:when>
<xsl:otherwise>
<xsl:message select="concat('ERROR: empty persName for ', $persName/@xml:id)"/>
<xsl:text>-</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
OK, I replaced my function with yours. No idea what $persName/tei:term
is supposed to do, I wasn't aware we have terms inside speaker names.
Anyway, conversion to vertical on the first file gives the result as below, first for UA, then for BG. I hope BG will be happy with this as well, if not, @osenova, @KirilSimov, pls. react ASAP!
UA:
speaker_id="АдамІвановичМартинюк.1950" speaker_name="Мартинюк, Адам Іванович"
speaker_id="АллаОлександрівнаАлександровська.1948" speaker_name="Александровська, Алла Олександрівна"
speaker_id="АнатолійАнатолійовичСтепаненко.1963" speaker_name="Степаненко, Анатолій Анатолійович"
speaker_id="АнатолійІвановичМярковський.1961" speaker_name="Мярковський, Анатолій Іванович"
speaker_id="АнатолійКириловичКінах.1954" speaker_name="Кінах, Анатолій Кирилович"
speaker_id="АндрійАнатолійовичКожемякін.1965" speaker_name="Кожем’якін, Андрій Анатолійович"
speaker_id="АндрійМихайловичПавловський.1965" speaker_name="Павловський, Андрій Михайлович"
speaker_id="АрсенійПетровичЯценюк.1974" speaker_name="Яценюк, Арсеній Петрович"
speaker_id="ВалерійОлексійовичБаранов.1957" speaker_name="Баранов, Валерій Олексійович"
speaker_id="ВладиславВалентиновичЛукянов.1964" speaker_name="Лук’янов, Владислав Валентинович"
speaker_id="ВолодимирМихайловичЛитвин.1956" speaker_name="Литвин, Володимир Михайлович"
speaker_id="ВолодимирОлександровичЯворівський.1942" speaker_name="Яворівський, Володимир Олександрович"
speaker_id="ВячеславАнатолійовичКириленко.1968" speaker_name="Кириленко, В’ячеслав Анатолійович"
speaker_id="ГригорійЄвдокимовичСмітюх.1961" speaker_name="Смітюх, Григорій Євдокимович"
speaker_id="ІванОлександровичЗаєць.1952" speaker_name="Заєць, Іван Олександрович"
speaker_id="КатеринаСеменівнаСамойлик.1951" speaker_name="Самойлик, Катерина Семенівна"
speaker_id="КатеринаТимофіївнаВащук.1947" speaker_name="Ващук, Катерина Тимофіївна"
speaker_id="МиколаВолодимировичТоменко.1964" speaker_name="Томенко, Микола Володимирович"
speaker_id="МихайлоВасильовичЧечетов.1953" speaker_name="Чечетов, Михайло Васильович"
speaker_id="ОлегВалерійовичЛяшко.1972" speaker_name="Ляшко, Олег Валерійович"
speaker_id="ОлегОлександровичЗарубінський.1963" speaker_name="Зарубінський, Олег Олександрович"
speaker_id="ОлександрІвановичКузьмук.1954" speaker_name="Кузьмук, Олександр Іванович"
speaker_id="ОлександрМиколайовичБондар.1955" speaker_name="Бондар, Олександр Миколайович"
speaker_id="ПетроМиколайовичСимоненко.1952" speaker_name="Симоненко, Петро Миколайович"
speaker_id="ПетроСтепановичЦибенко.1949" speaker_name="Цибенко, Петро Степанович"
speaker_id="РаїсаМиколаївнаСорочинська-Кириленко.1946" speaker_name="Сорочинська-Кириленко, Раїса Миколаївна"
speaker_id="СергійВолодимировичГордієнко.1957" speaker_name="Гордієнко, Сергій Володимирович"
speaker_id="СергійВолодимировичСас.1957" speaker_name="Сас, Сергій Володимирович"
speaker_id="СпірідонПавловичКілінкаров.1968" speaker_name="Кілінкаров, Спірідон Павлович"
BG:
speaker_id="BorisovBoyko" speaker_name="Методиев Борисов, Бойко"
speaker_id="ChukolovDesislav" speaker_name="Славов Чуколов, Десислав"
speaker_id="DanailovStefan" speaker_name="Ламбов Данаилов, Стефан"
speaker_id="IontchevRumen" speaker_name="Маринов Йончев, Румен"
speaker_id="KalfinIvaylo" speaker_name="Георгиев Калфин, Ивайло"
speaker_id="KanevRadan" speaker_name="Миленов Кънев, Радан"
speaker_id="KardzhalievTuncher" speaker_name="Мехмедов Кърджалиев, Тунчер"
speaker_id="KazakTchetin" speaker_name="Хюсеин Казак, Четин"
speaker_id="KunevaMeglena" speaker_name="Щилиянова Кунева, Меглена"
speaker_id="MestanLyutvi" speaker_name="Ахмед Местан, Лютви"
speaker_id="MikovMihail" speaker_name="Райков Миков, Михаил"
speaker_id="NaydenovAngel" speaker_name="Петров Найденов, Ангел"
speaker_id="PlevnelievRosen" speaker_name="Асенов Плевнелиев, Росен"
speaker_id="RashidovVezhdi" speaker_name="Летиф Рашидов, Вежди"
speaker_id="SiderovVolen" speaker_name="Николов Сидеров, Волен"
speaker_id="SimeonovValeri" speaker_name="Симеонов Симеонов, Валери"
@TomazErjavec & @matyaskopp Many thanks for putting the UA speaker names in the proper order!
Hi, for Bulgarian this is not a good order. We would never start with the middle name, then family name, then given name. Thus, speaker_id="PlevnelievRosen" speaker_name="Асенов Плевнелиев, Росен" should become rather speaker_id="PlevnelievRosen" speaker_name="Плевнелиев, Росен Асенов". Here we have the family name by which we recognize the speaker. Then we can have the given name and the middle (surname). (I give the best order in the given pattern only.) Even better would be just family and given names: speaker_id="PlevnelievRosen" speaker_name=" Плевнелиев, Росен". But maybe for the sake of disambiguation it can follow my first suggested pattern.
Dear All,
Just one example:
Kiril Ivanov Simov - full official name Kiril Simov - shorten official name Simov - family name
Simov, Kiril Ivanov - full name if family name has to be first Simov, Kiril - similar
Kiril - first name
I think there are no other possibilities.
With best regards,
Kiril
-------Original Message------- From: Petya Osenova @.> To: clarin-eric/ParlaMint @.> Cc: KirilSimov @.>, Mention @.> Subject: Re: [clarin-eric/ParlaMint] Patronymic names (Issue #581) Sent: 21 Sep '23 11:55
Hi, for Bulgarian this is not a good order. We would never start with the middle name, then family name, then given name. Thus, speaker_id="PlevnelievRosen" speaker_name="Асенов Плевнелиев, Росен" should become rather speaker_id="PlevnelievRosen" speaker_name="Плевнелиев, Росен Асенов". Here we have the family name by which we recognize the speaker. Then we can have the given name and the middle (surname). (I give the best order in the given pattern only.) Even better would be just family and given names: speaker_id="PlevnelievRosen" speaker_name=" Плевнелиев, Росен". But maybe for the sake of disambiguation it can follow my first suggested pattern.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Thanks for your comment @osenova and @KirilSimov. So, it seems, if I understand correctly, that it is impossible to make a country and langauge independnet order. Or maybe it is, because BG just has two surnames, but UA has one explicitly marked by type="patronym"
, so maybe this could be used to advantage.
@matyaskopp, given that you wrote the current function, would you also be able to modify it given the above?
This happened because BG data do not encode patronym: https://github.com/clarin-eric/ParlaMint/blob/437c87f41a9880c4a5f3b43922d10b12bbe7a9e8/Samples/ParlaMint-BG/ParlaMint-BG-listPerson.xml#L4033-L4038
<person xml:id="BorisovBoyko">
<persName>
<forename>Бойко</forename>
<surname>Методиев</surname>
<surname>Борисов</surname>
</persName>
We can`t change the order of surnames because Spanish/Galician/Catalan/... names need to preserve order.
So, if BG needs to have a different order, we need type="patronym"
. Otherwise, it would be difficult to implement it (if cyrilic, then different order ???).
another possibility is to store the patronymic name in forename:
<person xml:id="BorisovBoyko">
<persName>
<forename>Бойко</forename>
<forename>Методиев</forename>
<surname>Борисов</surname>
</persName>
@KirilSimov, would it be difficult to implement this addtion? If the order indicates what is a partronym, then it could be autmated.
Dear Tomaž,
The names are very similar between Russian, Ukrainian, Belorussian and Bulgarian (probably Macedonian).
In Bulgarian we call the names:
Licno/sobstveno ime (personal name, forename), prezime (surname), familiya (surname)
Prezime is derivational from father name, but could be also from mother name (if the father is not known or other reasons). The translation to English is surname sa for family name.
Thus, we could have Kiril Ivanov Simov or Petyr Marijkin Goshev (for the mother case)
For Russian it is similar:
Fyodor Mikhailovich Dostoevsky
Again Mikhailovich is derivational from the father name. The main difference is in usages of the names. In Russian, Ukrainian, Belorussian there are possibilities like:
Fyodor Mikhailovich Mikhailovich Mikhailovich, Fyodor
All of these are not possible in Bulgarian.
Thus, there are two possibilities in my view:
To uses
A new element to be introduced:
In Russian, Ukrainian, Belorussian there are possibilities like: Fyodor Mikhailovich Mikhailovich Mikhailovich, Fyodor
For the sake of accuracy, it is wrong. In uk/ru/be only two orders are acceptable for native speakers: 1.forename 2.patronymic 3.surname or 3.surname 1.forename 2.patronymic (with or without a comma between 3 and 1 in the second variant, depending on the genre / style). However, putting 2 just before 1 (either with or without a comma) or starting this string with 2 is erroneous.
E.g. 1. Taras 2. Hryhorovych 3. Shevchenko or 3. Shevchenko 1.Taras 2. Hryhorovych (in fact, both orders are used in wiki )
Dear Anna,
Thank you very much for correcting me! I took a look on the whole discussion and the easy solution is we also to use patronymic element for Bulgarian name. Then the script will produce the correct order: Borisov, Bojko Metodiev
With best regards,
Kiril
-------Original Message------- From: AnnaParla @.> To: clarin-eric/ParlaMint @.> Cc: KirilSimov @.>, Mention @.> Subject: Re: [clarin-eric/ParlaMint] Patronymic names (Issue #581) Sent: 22 Sep '23 20:59
In Russian, Ukrainian, Belorussian there are possibilities like: Fyodor Mikhailovich Mikhailovich Mikhailovich, Fyodor
For the sake of accuracy, it is wrong. In uk/ru/be only two orders are acceptable for native speakers: 1.forename 2.patronymic 3.surname or 3.surname 1.forename 2.patronymic (with or without a comma between 3 and 1 in the second variant, depending on the genre / style). However, putting 2 just before 1 (either with or without a comma) or starting this string with 2 is erroneous. E.g. 1. Taras 2. Hryhorovych 3. Shevchenko or 3. Shevchenko 1.Taras
Hryhorovych (in fact, both orders are used in wiki )
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
easy solution is we also to use patronymic element for Bulgarian name
Great, @KirilSimov, so can you implement that in your listPerson?
Just one note, it is not <patronymic>
but rather <surname type="patronymic">
.
It would be great to get this soon though, as we are out of time...
@matyaskopp, I just noticed that your code for formatting names doesn't take into account <nameLink>
which is used by ES-CT:
https://github.com/clarin-eric/ParlaMint/blob/19b751a624ac93f92274adb5920b7d38e0d70e45/Samples/ParlaMint-ES-CT/ParlaMint-ES-CT-listPerson.xml#L5-L10
Right now it outputs e.g. in vertical:
speaker_id="AbellaJeannine" speaker_name="Abella Chica, Jeannine"˙
Woud be nice to fix this (soon).
@TomazErjavec, implemented in devel branch
Has this been solved ok now? cf. e.g.
If yes, can somebody pls. close the issue? If not, pls. move it to "Future" milestone. Or let me know, and I will do it.
@TomazErjavec UA data does not seem to be loaded in NoSKETCH: so I can't check it there, but data in TEITOK seem to be ok.
UA data does not seem to be loaded in NoSKETCH
Sorry, you must've checked just when I was recompiling the corpus, pls. try again.
closing if @AnnaParla @osenova @KirilSimov complain please reopen
@annaparla @KirilSimov @osenova
I want to discuss with you how we wish to treat patronymic names. TEI (https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDPER) allowed multiple solutions:
<forename>
,<addName>
and<surname>
(and possibly specification with attributetype="patronym"
- ParlaMint schema does not allow it)I started with using
<forename>
: https://github.com/ufal/ParlaMint-UA/blob/main/SampleMetaData/03-ParlaMint-UA/ParlaMint-UA-listPerson.xmlBut now I checked ParlaMint-BG, and there is
<surname>
used: https://github.com/ivo-clark/ParlaMint/blob/data/Data/ParlaMint-BG/ParlaMint-BG.xmlI am not happy with either of these solutions. I believe the best is to use
<addName>
because it distinguishes the patronymic name:Or, @TomazErjavec, can we extend the schema with the
type
attribute:to be more specific?