clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
50 stars 53 forks source link

Patronymic names #581

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

@annaparla @KirilSimov @osenova

I want to discuss with you how we wish to treat patronymic names. TEI (https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html#NDPER) allowed multiple solutions: <forename>, <addName> and <surname> (and possibly specification with attribute type="patronym" - ParlaMint schema does not allow it)

I started with using <forename>: https://github.com/ufal/ParlaMint-UA/blob/main/SampleMetaData/03-ParlaMint-UA/ParlaMint-UA-listPerson.xml

      <persName>
         <forename>Володимир</forename>
         <forename>Олександрович</forename>
         <surname>Зеленський</surname>
      </persName>

But now I checked ParlaMint-BG, and there is <surname> used: https://github.com/ivo-clark/ParlaMint/blob/data/Data/ParlaMint-BG/ParlaMint-BG.xml

<persName xml:lang="bg">
  <forename>Росен</forename>
  <surname>Асенов</surname>
  <surname>Плевнелиев</surname>
</persName>

I am not happy with either of these solutions. I believe the best is to use <addName> because it distinguishes the patronymic name:

      <persName>
         <forename>Володимир</forename>
         <addName>Олександрович</addName>
         <surname>Зеленський</surname>
      </persName>

Or, @TomazErjavec, can we extend the schema with the type attribute:

      <persName>
         <forename>Володимир</forename>
         <addName type="patronym">Олександрович</addName>
         <surname>Зеленський</surname>
      </persName>

to be more specific?

TomazErjavec commented 1 year ago

With my limited knowledge it seems patronymic is more like a surname than like a forename or addName.

I leave it up to you to decide but, yes, I can easily add the type attribute to the three elements.

AnnaParla commented 1 year ago

In Ukrainian, patronymic is part of one's personal name, but it is neither a forename nor a surname. Grammatically, patronymics take the form and function of the adjective, whereas forenames take the form and function of the noun. The latter is also true about most surnames, although some of them are substantivized. However, not all ethnic Ukrainians have patronymics as part of their legal personal name. A few of our ministers were born in the USA or elsewhere and have no patronymic but can have a middle name or a second forename. In a nutshell, I believe the addName element will work for Ukrainian best. Also, I am ok with not differentiating between patronymics and middle names in this category, if it is easier for @matyaskopp .

KirilSimov commented 1 year ago

In Bulgarian patronymic name is a separate name. In form it is closed to surname - possessive adjective from the name of the parent (in some cases from the name of the mather) and similarly for the family name ( in almost all cases from the name of the one of the grandparents). Now we encode them as a surname element and distinguish them be the order - the first surname element is the patronymic name. If the order is not good for this purpose that we could use some attribute. Bulgarian citizens from not-Bulgarian origin could have just two name .

KirilSimov commented 1 year ago

In our view elements are when the second (third, ...) name is consider equal to the personal name. Such are very rare in Bulgarian like <forname Penka Maria Petrova Georgieva or Petar Emil Goshev Mitev

matyaskopp commented 1 year ago

The trouble with ParlaMint schema is that we do not require any particular name parts order, and in the samples (https://clarin-eric.github.io/ParlaMint/#sec-speakers), there is a mixture of orders (surname first/last position). So it needs to be clarified how the name should be ordered. So I think we should

As for the patronymic, I don't have a strong opinion about which surname/addName we should use. If we use the type attribute, we can probably admit both cases

TomazErjavec commented 1 year ago

we should require some order

I disagree - having specified the type of the name part, there is no necessity to impose an order in XML. It is the job of the rendering software to determine how it should be displayed, i.e. forename first or last.

add type attribute to allow specifying the type of name (patronym / married / religious)

This is not a problem, is the list of above 3 values final? I'd add them only to surname though (see below).

As for the patronymic, I don't have a strong opinion about which surname/addName we should use. If we use the type attribute, we can probably admit both cases

For this, I'd be happier if we decide, because we shouldn't have 2 way of representing the same data. I vote for surname.

KirilSimov commented 1 year ago

I also think that the last option is better.

Kiril

-------Original Message------- From: Tomaž Erjavec @.> To: clarin-eric/ParlaMint @.> Cc: KirilSimov @.>, Mention @.> Subject: Re: [clarin-eric/ParlaMint] Patronymic names (Issue #581) Sent: 19 Jan '23 22:56

we should require some order

I disagree - having specified the type of the name part, there is no necessity to impose an order in XML. It is the job of the rendering software to determine how it should be displayed, i.e. forename first or last.

add type attribute to allow specifying the type of name (patronym / married / religious)

This is not a problem, is the list of above 3 values final? I'd add them only to surname though (see below).

As for the patronymic, I don't have a strong opinion about which surname/addName we should use. If we use the type attribute, we can probably admit both cases

For this, I'd be happier if we decide, because we shouldn't have 2 way of representing the same data. I vote for surname.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

matyaskopp commented 1 year ago

I disagree - having specified the type of the name part, there is no necessity to impose an order in XML. It is the job of the rendering software to determine how it should be displayed, i.e. forename first or last.

I am not sure if I agree. The examples in our documentation show <surname><forename> order and do not mention anything about that the names of the same type (eg surname) should be sorted as they are commonly used _patronymic lastsurname. So it is possible to have this:

<persName>
  <surname>LAST_SURNAME</surname>
  <forename>FORENAME</forename>
  <surname>PATRONYMIC</surname>
</persName>

The order in my example is used in Ukrainian transcriptions: LAST_SURNAME F.P.

Rendering software will not be able to determine the right order of the names. Forename first will be: FORENAME LAST_SURNAME PATRONYMIC if stable sort is used for forenames and surnames sorting.

This is not a problem, is the list of above 3 values final? I'd add them only to surname though (see below).

I am not sure if it is final (religious should be forename). We can start with surname typeing:

For this, I'd be happier if we decide, because we shouldn't have 2 way of representing the same data. I vote for surname

ok if we add type

osenova commented 1 year ago

Hi All,

The closest tip I see in the TEI is this:

Franklin Delano Roosevelt

In our data we used twice surname instead of twice forename. From my point of view it is more appropriate since both names are possessive adjectives. The same should hold for all Slavic languages:

Росен Асенов Плевнелиев

I suggest we use this option and leave it like this. Otherwise, attributes should be used. But I think we should be as simple and as comprehensible as possible.

TomazErjavec commented 1 year ago

The examples in our documentation show order and do not mention anything about that the names of the same type

Yes, but this should change now, so there will not be any names of the same type anymore.

I am not sure if it is final (religious should be forename).

What should "religious" forename be? Like "Father" for priests? Or "St.", in we have saints in ParlaMint? :) Because these could be roleName, if we need them.

We can start with surname typeing: patronym (both for patronymic and matronymic names), married for name after marriage, birth

I am only unsure about "birth", as that is the default anyway, so not sure if we should add it at all.

AnnaParla commented 1 year ago

Dear All,

From my point of view it is more appropriate since both names are possessive adjectives. The same should hold for all Slavic languages

For the sake of accuracy, it does not hold for Ukrainian. Morphologically, Ukrainian patronymics derive from male forenames and take the unproductive suffix -ovych (masculine) or -ivna (feminine), e.g. Ivanovych / Ivanivna. However, they are not homonymous to contemporary possessive adjectives derived from forenames, e.g. Ivaniv syn (Ivan's son), Ivanova dochka (Ivan's daughter), which are not used as patronymics in Ukrainian. As for Ukrainian surnames, their derivational patterns are quite diverse. In short, they can be derived from nouns, adjectives, verbs or even verbal phrases. They can also be homonymous to common nouns (e.g. Vovk), adjectives (e.g. Lysyi) and sometimes indeed to patronymics (e.g. Sydorovych).

Can annotating patronymics as surnames complicate search procedures when the data is mounted on a concordancer?

I do believe that adding type might be useful. Esp. wrt those speakers who change their surname in the midst of a term due to marriage/divorce. Or is there a better way to differentiate between old and current names? Also, a few of our now former ministers were born in the USA. They have middle names and no patronymics. Shall we use forename twice in the latter cases?

пт, 20 січ. 2023 р. о 11:23 Petya Osenova @.***> пише:

Hi All,

The closest tip I see in the TEI is this: Franklin Delano Roosevelt

In our data we used twice surname instead of twice forename. From my point of view it is more appropriate since both names are possessive adjectives. The same should hold for all Slavic languages: Росен Асенов Плевнелиев

I suggest we use this option and leave it like this. Otherwise, attributes should be used. But I think we should be as simple and as comprehensible as possible.

— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/581#issuecomment-1398186457, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3OOYEPLMHSGADUAGFZ5DC3WTJRTRANCNFSM6AAAAAAT4BYTSQ . You are receiving this because you were mentioned.Message ID: @.***>

osenova commented 1 year ago

Thanks, Anna! Then it would not hurt if there are two or more approaches, and these to be reflected in the documentation.

Also, a few of our now former ministers were born in the USA. They have middle names and no patronymics. Shall we use forename twice in the latter cases?

This is the way I saw they do it.

Of course, Tomaz and Matyas might have better ideas.

AnnaParla commented 1 year ago

Good point about documentation, Petya! Looking forward to contributing to it, once final decisions are made and implemented.

TomazErjavec commented 1 year ago

I've now added @type to surname and forename - I discoverend the TEI examples, so I followed the naming scheme there. So, e.g. U.S. middle names should be <forename type="middle">. And maybe this is where @matyaskopp got "religious" from although note that there this type is used on persName.

But don't forget that we have temporal attributes on persName, so if somebody changes their name due to e.g. marriage, they should get two names, first one marked, with with @to and second with @from.

matyaskopp commented 1 year ago

The order of name (patronym surname, forename) in vert files is not good for Ukrainian, should be surname, forename patronym

I think the solution is replacing https://github.com/clarin-eric/ParlaMint/blob/9d8ef3805162765fd20282275a65c1a3742a0fcb/Scripts/parlamint-lib.xsl#L300-L340

with:

  <!-- Format the name of a person from persName -->
  <xsl:function name="et:format-name">
    <xsl:param name="persName"/>
    <xsl:choose>
      <xsl:when test="$persName/tei:forename[normalize-space(.)] or $persName/tei:surname[normalize-space(.)]">
        <xsl:value-of select="normalize-space(
                                    string-join(
                                        (
                                          string-join($persName/tei:surname[not(@type='patronym')]/normalize-space(.),' '),
                                          concat(
                                            string-join($persName/tei:forename/normalize-space(.),' '),
                                            '',' ',
                                            string-join($persName/tei:surname[@type='patronym']/normalize-space(.),' ')
                                            )
                                        )[normalize-space(.)],
                                        ', ' ))"/>
      </xsl:when>
      <xsl:when test="$persName/tei:term">
        <xsl:value-of select="concat('@', $persName/tei:term, '@')"/>
      </xsl:when>
      <xsl:when test="normalize-space($persName)">
        <xsl:value-of select="$persName"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:message select="concat('ERROR: empty persName for ', $persName/@xml:id)"/>
        <xsl:text>-</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

It also fixes this bug (testing forename existence, replacing nonexisting surname) https://github.com/clarin-eric/ParlaMint/blob/9d8ef3805162765fd20282275a65c1a3742a0fcb/Scripts/parlamint-lib.xsl#L326-L328

@TomazErjavec I am not sure about the phase of conversion, can this be included?

matyaskopp commented 1 year ago

now I see the script needs a bit of tuning, missing values can break it...

matyaskopp commented 1 year ago

I think this should work. It isn't easy to test because it needs to see all the data...

  <!-- Format the name of a person from persName -->
  <xsl:function name="et:format-name">
    <xsl:param name="persName"/>
    <xsl:choose>
      <xsl:when test="$persName/tei:forename[normalize-space(.)] or $persName/tei:surname[normalize-space(.)]">
        <xsl:value-of select="normalize-space(
                                    string-join(
                                        (
                                          string-join(
                                            ('',$persName/tei:surname[not(@type='patronym')]/normalize-space(.)),
                                            ' '),
                                          concat(
                                            string-join(
                                              ('',$persName/tei:forename/normalize-space(.)),
                                              ' '),
                                            '',' ',
                                            string-join(
                                              ('',$persName/tei:surname[@type='patronym']/normalize-space(.)),
                                              ' ')
                                            )
                                        )[normalize-space(.)],
                                        ', ' ))"/>
      </xsl:when>
      <xsl:when test="$persName/tei:term">
        <xsl:value-of select="concat('@', $persName/tei:term, '@')"/>
      </xsl:when>
      <xsl:when test="normalize-space($persName)">
        <xsl:value-of select="$persName"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:message select="concat('ERROR: empty persName for ', $persName/@xml:id)"/>
        <xsl:text>-</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>
TomazErjavec commented 1 year ago

OK, I replaced my function with yours. No idea what $persName/tei:term is supposed to do, I wasn't aware we have terms inside speaker names.

Anyway, conversion to vertical on the first file gives the result as below, first for UA, then for BG. I hope BG will be happy with this as well, if not, @osenova, @KirilSimov, pls. react ASAP!

UA:

speaker_id="АдамІвановичМартинюк.1950" speaker_name="Мартинюк, Адам Іванович"
speaker_id="АллаОлександрівнаАлександровська.1948" speaker_name="Александровська, Алла Олександрівна"
speaker_id="АнатолійАнатолійовичСтепаненко.1963" speaker_name="Степаненко, Анатолій Анатолійович"
speaker_id="АнатолійІвановичМярковський.1961" speaker_name="Мярковський, Анатолій Іванович"
speaker_id="АнатолійКириловичКінах.1954" speaker_name="Кінах, Анатолій Кирилович"
speaker_id="АндрійАнатолійовичКожемякін.1965" speaker_name="Кожем’якін, Андрій Анатолійович"
speaker_id="АндрійМихайловичПавловський.1965" speaker_name="Павловський, Андрій Михайлович"
speaker_id="АрсенійПетровичЯценюк.1974" speaker_name="Яценюк, Арсеній Петрович"
speaker_id="ВалерійОлексійовичБаранов.1957" speaker_name="Баранов, Валерій Олексійович"
speaker_id="ВладиславВалентиновичЛукянов.1964" speaker_name="Лук’янов, Владислав Валентинович"
speaker_id="ВолодимирМихайловичЛитвин.1956" speaker_name="Литвин, Володимир Михайлович"
speaker_id="ВолодимирОлександровичЯворівський.1942" speaker_name="Яворівський, Володимир Олександрович"
speaker_id="ВячеславАнатолійовичКириленко.1968" speaker_name="Кириленко, В’ячеслав Анатолійович"
speaker_id="ГригорійЄвдокимовичСмітюх.1961" speaker_name="Смітюх, Григорій Євдокимович"
speaker_id="ІванОлександровичЗаєць.1952" speaker_name="Заєць, Іван Олександрович"
speaker_id="КатеринаСеменівнаСамойлик.1951" speaker_name="Самойлик, Катерина Семенівна"
speaker_id="КатеринаТимофіївнаВащук.1947" speaker_name="Ващук, Катерина Тимофіївна"
speaker_id="МиколаВолодимировичТоменко.1964" speaker_name="Томенко, Микола Володимирович"
speaker_id="МихайлоВасильовичЧечетов.1953" speaker_name="Чечетов, Михайло Васильович"
speaker_id="ОлегВалерійовичЛяшко.1972" speaker_name="Ляшко, Олег Валерійович"
speaker_id="ОлегОлександровичЗарубінський.1963" speaker_name="Зарубінський, Олег Олександрович"
speaker_id="ОлександрІвановичКузьмук.1954" speaker_name="Кузьмук, Олександр Іванович"
speaker_id="ОлександрМиколайовичБондар.1955" speaker_name="Бондар, Олександр Миколайович"
speaker_id="ПетроМиколайовичСимоненко.1952" speaker_name="Симоненко, Петро Миколайович"
speaker_id="ПетроСтепановичЦибенко.1949" speaker_name="Цибенко, Петро Степанович"
speaker_id="РаїсаМиколаївнаСорочинська-Кириленко.1946" speaker_name="Сорочинська-Кириленко, Раїса Миколаївна"
speaker_id="СергійВолодимировичГордієнко.1957" speaker_name="Гордієнко, Сергій Володимирович"
speaker_id="СергійВолодимировичСас.1957" speaker_name="Сас, Сергій Володимирович"
speaker_id="СпірідонПавловичКілінкаров.1968" speaker_name="Кілінкаров, Спірідон Павлович"

BG:

speaker_id="BorisovBoyko" speaker_name="Методиев Борисов, Бойко"
speaker_id="ChukolovDesislav" speaker_name="Славов Чуколов, Десислав"
speaker_id="DanailovStefan" speaker_name="Ламбов Данаилов, Стефан"
speaker_id="IontchevRumen" speaker_name="Маринов Йончев, Румен"
speaker_id="KalfinIvaylo" speaker_name="Георгиев Калфин, Ивайло"
speaker_id="KanevRadan" speaker_name="Миленов Кънев, Радан"
speaker_id="KardzhalievTuncher" speaker_name="Мехмедов Кърджалиев, Тунчер"
speaker_id="KazakTchetin" speaker_name="Хюсеин Казак, Четин"
speaker_id="KunevaMeglena" speaker_name="Щилиянова Кунева, Меглена"
speaker_id="MestanLyutvi" speaker_name="Ахмед Местан, Лютви"
speaker_id="MikovMihail" speaker_name="Райков Миков, Михаил"
speaker_id="NaydenovAngel" speaker_name="Петров Найденов, Ангел"
speaker_id="PlevnelievRosen" speaker_name="Асенов Плевнелиев, Росен"
speaker_id="RashidovVezhdi" speaker_name="Летиф Рашидов, Вежди"
speaker_id="SiderovVolen" speaker_name="Николов Сидеров, Волен"
speaker_id="SimeonovValeri" speaker_name="Симеонов Симеонов, Валери"
AnnaParla commented 1 year ago

@TomazErjavec & @matyaskopp Many thanks for putting the UA speaker names in the proper order!

osenova commented 1 year ago

Hi, for Bulgarian this is not a good order. We would never start with the middle name, then family name, then given name. Thus, speaker_id="PlevnelievRosen" speaker_name="Асенов Плевнелиев, Росен" should become rather speaker_id="PlevnelievRosen" speaker_name="Плевнелиев, Росен Асенов". Here we have the family name by which we recognize the speaker. Then we can have the given name and the middle (surname). (I give the best order in the given pattern only.) Even better would be just family and given names: speaker_id="PlevnelievRosen" speaker_name=" Плевнелиев, Росен". But maybe for the sake of disambiguation it can follow my first suggested pattern.

KirilSimov commented 1 year ago

Dear All,

Just one example:

Kiril Ivanov Simov - full official name Kiril Simov - shorten official name Simov - family name

Simov, Kiril Ivanov - full name if family name has to be first Simov, Kiril - similar

Kiril - first name

I think there are no other possibilities.

With best regards,

Kiril

-------Original Message------- From: Petya Osenova @.> To: clarin-eric/ParlaMint @.> Cc: KirilSimov @.>, Mention @.> Subject: Re: [clarin-eric/ParlaMint] Patronymic names (Issue #581) Sent: 21 Sep '23 11:55

Hi, for Bulgarian this is not a good order. We would never start with the middle name, then family name, then given name. Thus, speaker_id="PlevnelievRosen" speaker_name="Асенов Плевнелиев, Росен" should become rather speaker_id="PlevnelievRosen" speaker_name="Плевнелиев, Росен Асенов". Here we have the family name by which we recognize the speaker. Then we can have the given name and the middle (surname). (I give the best order in the given pattern only.) Even better would be just family and given names: speaker_id="PlevnelievRosen" speaker_name=" Плевнелиев, Росен". But maybe for the sake of disambiguation it can follow my first suggested pattern.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

TomazErjavec commented 1 year ago

Thanks for your comment @osenova and @KirilSimov. So, it seems, if I understand correctly, that it is impossible to make a country and langauge independnet order. Or maybe it is, because BG just has two surnames, but UA has one explicitly marked by type="patronym", so maybe this could be used to advantage. @matyaskopp, given that you wrote the current function, would you also be able to modify it given the above?

matyaskopp commented 1 year ago

This happened because BG data do not encode patronym: https://github.com/clarin-eric/ParlaMint/blob/437c87f41a9880c4a5f3b43922d10b12bbe7a9e8/Samples/ParlaMint-BG/ParlaMint-BG-listPerson.xml#L4033-L4038

   <person xml:id="BorisovBoyko">
      <persName>
         <forename>Бойко</forename>
         <surname>Методиев</surname>
         <surname>Борисов</surname>
      </persName>

We can`t change the order of surnames because Spanish/Galician/Catalan/... names need to preserve order.

So, if BG needs to have a different order, we need type="patronym". Otherwise, it would be difficult to implement it (if cyrilic, then different order ???).

another possibility is to store the patronymic name in forename:

   <person xml:id="BorisovBoyko">
      <persName>
         <forename>Бойко</forename>
         <forename>Методиев</forename>
         <surname>Борисов</surname>
      </persName>
TomazErjavec commented 1 year ago

@KirilSimov, would it be difficult to implement this addtion? If the order indicates what is a partronym, then it could be autmated.

KirilSimov commented 1 year ago

Dear Tomaž,

The names are very similar between Russian, Ukrainian, Belorussian and Bulgarian (probably Macedonian).

In Bulgarian we call the names:

Licno/sobstveno ime (personal name, forename), prezime (surname), familiya (surname)

Prezime is derivational from father name, but could be also from mother name (if the father is not known or other reasons). The translation to English is surname sa for family name.

Thus, we could have Kiril Ivanov Simov or Petyr Marijkin Goshev (for the mother case)

For Russian it is similar:

Fyodor Mikhailovich Dostoevsky

Again Mikhailovich is derivational from the father name. The main difference is in usages of the names. In Russian, Ukrainian, Belorussian there are possibilities like:

Fyodor Mikhailovich Mikhailovich Mikhailovich, Fyodor

All of these are not possible in Bulgarian.

Thus, there are two possibilities in my view:

  1. To uses element in Bulgarian also, but the script has to take into account the language. In order to escape from generation of Ivanov, Kiril

  2. A new element to be introduced:

KirilIvanovSimov Probably both are problematic :) With best regards, Kiril > -------Original Message------- > From: Tomaž Erjavec ***@***.***> > To: clarin-eric/ParlaMint ***@***.***> > Cc: KirilSimov ***@***.***>, Mention ***@***.***> > Subject: Re: [clarin-eric/ParlaMint] Patronymic names (Issue #581) > Sent: 22 Sep '23 11:54 > > @KirilSimov, would it be difficult to implement this addtion? If the > order indicates what is a partronym, then it could be autmated. > > — > Reply to this email directly, view it on GitHub, or unsubscribe. > You are receiving this because you were mentioned.Message ID: > ***@***.***>
AnnaParla commented 1 year ago

In Russian, Ukrainian, Belorussian there are possibilities like: Fyodor Mikhailovich Mikhailovich Mikhailovich, Fyodor

For the sake of accuracy, it is wrong. In uk/ru/be only two orders are acceptable for native speakers: 1.forename 2.patronymic 3.surname or 3.surname 1.forename 2.patronymic (with or without a comma between 3 and 1 in the second variant, depending on the genre / style). However, putting 2 just before 1 (either with or without a comma) or starting this string with 2 is erroneous.
E.g. 1. Taras 2. Hryhorovych 3. Shevchenko or 3. Shevchenko 1.Taras 2. Hryhorovych (in fact, both orders are used in wiki )

KirilSimov commented 1 year ago

Dear Anna,

Thank you very much for correcting me! I took a look on the whole discussion and the easy solution is we also to use patronymic element for Bulgarian name. Then the script will produce the correct order: Borisov, Bojko Metodiev

With best regards,

Kiril

-------Original Message------- From: AnnaParla @.> To: clarin-eric/ParlaMint @.> Cc: KirilSimov @.>, Mention @.> Subject: Re: [clarin-eric/ParlaMint] Patronymic names (Issue #581) Sent: 22 Sep '23 20:59

In Russian, Ukrainian, Belorussian there are possibilities like: Fyodor Mikhailovich Mikhailovich Mikhailovich, Fyodor

For the sake of accuracy, it is wrong. In uk/ru/be only two orders are acceptable for native speakers: 1.forename 2.patronymic 3.surname or 3.surname 1.forename 2.patronymic (with or without a comma between 3 and 1 in the second variant, depending on the genre / style). However, putting 2 just before 1 (either with or without a comma) or starting this string with 2 is erroneous. E.g. 1. Taras 2. Hryhorovych 3. Shevchenko or 3. Shevchenko 1.Taras

  1. Hryhorovych (in fact, both orders are used in wiki )

    — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

TomazErjavec commented 1 year ago

easy solution is we also to use patronymic element for Bulgarian name

Great, @KirilSimov, so can you implement that in your listPerson? Just one note, it is not <patronymic> but rather <surname type="patronymic">. It would be great to get this soon though, as we are out of time...

TomazErjavec commented 1 year ago

@matyaskopp, I just noticed that your code for formatting names doesn't take into account <nameLink> which is used by ES-CT: https://github.com/clarin-eric/ParlaMint/blob/19b751a624ac93f92274adb5920b7d38e0d70e45/Samples/ParlaMint-ES-CT/ParlaMint-ES-CT-listPerson.xml#L5-L10

Right now it outputs e.g. in vertical: speaker_id="AbellaJeannine" speaker_name="Abella Chica, Jeannine"˙

Woud be nice to fix this (soon).

matyaskopp commented 1 year ago

@TomazErjavec, implemented in devel branch

TomazErjavec commented 1 year ago

Has this been solved ok now? cf. e.g.

If yes, can somebody pls. close the issue? If not, pls. move it to "Future" milestone. Or let me know, and I will do it.

matyaskopp commented 1 year ago

@TomazErjavec UA data does not seem to be loaded in NoSKETCH: image so I can't check it there, but data in TEITOK seem to be ok.

TomazErjavec commented 1 year ago

UA data does not seem to be loaded in NoSKETCH

Sorry, you must've checked just when I was recompiling the corpus, pls. try again.

matyaskopp commented 1 year ago

closing if @AnnaParla @osenova @KirilSimov complain please reopen