people's names #16

Closed vcvpaiva closed 3 years ago

vcvpaiva commented 3 years ago

this seems the wrong way of dealing with the name "Juan Carlos Salas": Salas is NOT a common noun. and the whole name is not a compound.

sent_id = n01150042 text = Criado por Juan Carlos Salas, o prédio vencedor tem uma aparência escultural e todos os detalhes têm um significado. text_en = Designed by Juan Carlos Salas, the award-winning building has a sculptural appearance and every detail carries meaning.

1 Criado VERB VBN Aspect=Perf|Gender=Masc|Number=Sing 8 acl 2 por ADP IN 3 case 3 Juan PROPN NNP Gender=Masc|Number=Sing 1 obl 4 Carlos PROPN NNP Gender=Masc|Number=Sing 3 flat:name 5 Salas NOUN NN Gender=Fem|Number=Plur 3 compound Proper=True|SpaceAfter=No

vcvpaiva commented 3 years ago

this also seems not a good use of compound, but seems more debatable how to deal with a name like "Andre Price III"

sent_id = n01011011 text = Ela matou o Andre Price III ao pressionar a cara dele contra um colchão de ar na sua sala de estar, antes de tentar fazer o mesmo à sua filha, Angel, disse a polícia. texten = She killed Andre Price III by pressing his face into an air mattress in her sitting room before trying to do the same to her daughter, Angel, police said.

vcvpaiva commented 3 years ago

'Tom Robinson' shouldn't be a compound.

newdoc id = w01130 sent_id = w01130099 text = Rafferty lançou mais dois álbuns na década de 1990 naquilo que o músico Tom Robinson descreveu mais tarde como "um grande retorno à forma". text_en = Rafferty released two further albums in the 1990s in what musician Tom Robinson later described as "a major return to form".

1 Rafferty PROPN NNP Gender=Masc|Number=Sing 2 nsubj 2 lançou VERB VBC Aspect=Perf|Mood=Ind|Number=Sing|Person=3|Tense=Past 0 root
15 Tom NOUN NN Gender=Masc|Number=Sing 14 appos Proper=True 16 Robinson PROPN NNP Gender=Masc|Number=Sing 15 compound

vcvpaiva commented 3 years ago

the same for Norma Talmadge in

sent_id = w01136074 text = Joseph Schenck estava interessado em selecionar a sua esposa, Norma Talmadge, para contracenar comValentino em uma versão de Romeu e Julieta. texten = Joseph Schenck was interested in casting his wife, Norma Talmadge, opposite Valentino in a version of Romeo and Juliet.

dan-zeman commented 3 years ago

Apparently the text was pre-tagged automatically and the human annotators did not catch all the errors. Personal names definitely should be PROPN regardless whether they look like a common noun in the host language.

The III in Andre Price III is debatable but compound is wrong. Either it is PROPN and attached via flat(:name) to Andre. Or it is ADJ (ordinal Roman numeral) and attached to Andre as an amod.

vcvpaiva commented 3 years ago

another one personal name

newdoc id = n02076 sent_id = n02076003 text = Adnan Z. Amin, Diretor Geral da Agência Internacional de Energias Renováveis (IRENA) está seguro: "Energia eólica offshore pode se tornar a maior geradora de energia, em uma economia global, que é livre de energia a base de carvão." texten = Adnan Z. Amin, General Director of the International Organization for Renewable Energies (IRENA) is certain: "Offshore wind power can become the top power generator in a global economy that is free of coal-based energy."
1 Adnan PROPN NNP Gender=Masc|Number=Sing 18 nsubj 2 Z. NOUN NN Gender=Masc|Number=Sing 1 compound Proper=True

vcvpaiva commented 3 years ago

`Tina Anselmi' should not be compound in two sentences

newdoc id = n04001 sent_id = n04001002 text = Tina Anselmi nasceu em 25 de março, 1927 em Castelfranco Veneto; ela cresceu em uma família católica anti-fascista, a qual foi marcada por perseguições ao seu pai militante socialista. texten = Tina Anselmi was born on the 25th of March, 1927 in Castelfranco Veneto; she grew up in an anti-fascist Catholic family, which was marked by the persecution of her militant socialist father.
1 Tina NOUN NN Gender=Fem|Number=Sing 3 nsubj Proper=True 2 Anselmi PROPN NNP Gender=Fem|Number=Sing 1 compound


sent_id = n04001013 text = Tendo sempre sido próxima da União Católica, Tina Anselmi ocupava-se particularmente dos direitos dos trabalhadores têxteis e professores. text_en = Having always been close to the Catholic Union, Tina Anselmi attended particularly to the rights of textile workers and teachers.

vcvpaiva commented 3 years ago

another 2 names (Cristina Cifuentes and Javier Maroto) that shouldn't be compound. Maroto is not ADJ but PROPN

newdoc id = n05001 sent_id = n05001005 text = A presidente da Comunidade de Madrid, Cristina Cifuentes, representa os mais conservadores, enquanto líderes do partido, como o Subsecretário Setorial Javier maroto, representa os mais progressivos. texten = The president of the Community of Madrid, Cristina Cifuentes, represents the most conservative, while leaders of the party, such as the Sectorial Under-Secretary, Javier Maroto, represent the most progressive.
9 Cristina PROPN NNP Gender=Fem|Number=Sing 2 appos 10 Cifuentes NOUN NN Gender=Fem|Number=Sing 9 compound Proper=True|SpaceAfter=No
27 Javier PROPN NNP Gender=Masc|Number=Sing 25 appos 28 maroto ADJ JJ Gender=Masc|Number=Sing 27 compound Proper=True|SpaceAfter=No

vcvpaiva commented 3 years ago

Pintado is the name of the person, not an adjective

sent_id = n05001008 text = Durán atua como porta-voz e Ángel Pintado como ministro da fazenda. texten = Durán acts as spokesman and Ángel Pintado as treasurer.
6 Ángel PROPN NNP Gender=Masc|Number=Sing 2 conj origdeprel=nsubj 7 Pintado ADJ JJ Gender=Masc|Number=Sing 6 compound Proper=True

vcvpaiva commented 3 years ago

Like the III in Andre Price III and several kings we have Jr in

sent_id = w02005029 text = Possui um monumento a Martin Luther King Jr. text_en = It contains a monument to Martin Luther King, Jr.

sent_id = w02005029 text = Possui um monumento a Martin Luther King Jr. text_en = It contains a monument to Martin Luther King, Jr.
5 Martin PROPN NNP Gender=Masc|Number=Sing 3 nmod 6 Luther PROPN NNP Gender=Masc|Number=Sing 5 flat:name 7 King PROPN NNP Gender=Masc|Number=Sing 5 flat:name 8 Jr ADJ JJ Gender=Masc|Number=Sing 5 compound Proper=True|SpaceAfter=No

vcvpaiva commented 3 years ago

another first name only (and very bad translation of 'who', it should be 'que' not 'quem')

sent_id = w05010025 text = Enquanto isso, seu lugar na tribuna foi tomado por Marco Antonio, quem ocupou a posição até dezembro. text_en = Meanwhile, his place in tribune was occupied by Marco Antonio, who held the position until December.

sent_id = w05010025 text = Enquanto isso, seu lugar na tribuna foi tomado por Marco Antonio, quem ocupou a posição até dezembro. text_en = Meanwhile, his place in tribune was occupied by Marco Antonio, who held the position until December.
12 Marco NOUN NN Gender=Masc|Number=Sing 10 obl Proper=True 13 Antonio PROPN NNP Gender=Masc|Number=Sing 12 compound SpaceAfter=No

vcvpaiva commented 3 years ago

last person name considered "compound" Marco Antônio

sent_id = w05010027 text = Em 1.º de janeiro, 49 a.C., Marco Antônio leu uma declaração de César, na qual o pro-cônsul declarou-se um amigo da paz. texten = On the 1st January 49 BC, Marco Antonio read a declaration from Caesar in which the proconsul declared himself a friend of peace.
9 Marco NOUN NN Gender=Masc|Number=Sing 11 nsubj Proper=True 10 Antônio PROPN NNP Gender=Masc|Number=Sing 9 compound

arademaker commented 3 years ago

I feel bad for keeping III as ADJ in Andre Price III but Jr as flat:name to ... King Jr... but I missing a more strong argument to change it.

vcvpaiva commented 3 years ago

well, there are all the other kings that I haven't written explicitly yet, e.g. Ramses II, Tutmes III, Amintas I