clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

Update README.md #723

Closed calzada closed 1 year ago

calzada commented 1 year ago

@matyaskopp @osenova @maciej-ogrodniczuk : PLEASE UPDATE BRANCH IF CONSIDERED APPROPRIATE,

Update ParlaMnt-ES.v-3.0 readme.md

matyaskopp commented 1 year ago

@calzada I quickly went through the README, which can be completed, and the obvious untruth can be fixed too. README inserting is done in the same way as the data, so please follow the contributing file guidelines CONTRIBUTING.md

Untruths:

Possible extension:

If you want to include original ECPC XML it would probably be better to describe the conversion to this format first and next to describe the conversion to ParlaMint TEI:

calzada commented 1 year ago

Dear MAtyas, I answer below.

El martes, 8 de agosto de 2023, Matyáš Kopp @.***> escribió:

@calzada https://github.com/calzada I quickly went through the README, which can be completed, and the obvious untruth can be fixed too. README inserting is done in the same way as the data, so please follow the contributing file guidelines CONTRIBUTING.md https://github.com/clarin-eric/ParlaMint/blob/main/CONTRIBUTING.md

Untruths: NOT UNTRUTHS ;-)

  • stanza was not used for annotations - Stanza was not used for annotations in ParlaMint-ES.v3.0 but it was used for ParlaMint-ES-2.1. At any rate,I was waiting to check when you finished the annotation. So I will now just say, it was annotated with UDPipe for ParlaMint.es-v.3.0
  • corpus specific metadata section is misleading https://github.com/calzada/ParlaMint/tree/calzada-patch- 1-1/Data/ParlaMint-ES#corpus-specific-metadata https://github.com/calzada/ParlaMint/tree/calzada-patch-1-1/Data/ParlaMint-ES#corpus-specific-metadata
  • the conversion to TEI was done at the end of your pipeline, I am not aware of any other quality contro.YES MONICA REVISED OUR XML FILES TO MAKE SURE CERTAIN MISTAKES WERE ERADICATED. IN FACT MONICA USED CHATGPT TO THAT EFFECT.
  • I was engaged only in the last step (government members gathering, conversion to TEI, and lingv. annotations), nothing else. WELL, YOU DID A LOT OF WORK.

Possible extension:

  • government members' acquisition: WHAT DO YOU MEAN BY THIS?
  • you can explain why the chairman's speeches are not affiliated with the exact person: OK I WILL EXPLAIN THIS.

If you want to include original ECPC XML it would probably be better to describe the conversion to this format first and next to describe the conversion to ParlaMint TEI:

  • a process of conversion
  • what has not been encoded (change party to group, constituencies of all MPs ). WELL, IT ISA SHAME BECAUSE WE DID HAVE ALL THIS INFORMATION, BUT WE USED TOMAZ ERJAVECˋS CONVERSION SINCE TIME WAS TIGHT. NEXT VERSION WILL INCLUDE THIS EASILY.
  • what has been newly added (government members, lingv. annotations, coalition/opposition,...): GOVERNMENT MEMBERS WERE ALREADY ADDED IN PARLAMINT-ES-v-2.1// lingv annotations were added in ParlaMInt-ES-2.1//COALITION/OPPOSITION was there in ParlaMint-ES-2.1. SO THESE ARENOT NEW INFORMATION ITEMS???

— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/pull/723#issuecomment-1670125068, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AREUX5JM2B3CROS3SNFLXUKC2DANCNFSM6AAAAAA3IPMSKQ . You are receiving this because you were mentioned.Message ID: @.***>

calzada commented 1 year ago

Final question. I do not have my Github DEsktop and this is why I am having problems updating the documentation. I cannot work the way I normally work since i have a tablet here. I have to update the document online. But then I am forced to pull a request. Is this alright? Does anyone checks on my request? Best for now, mc

El martes, 8 de agosto de 2023, María Calzada Pérez @.***> escribió:

Dear MAtyas, I answer below.

El martes, 8 de agosto de 2023, Matyáš Kopp @.***> escribió:

@calzada https://github.com/calzada I quickly went through the README, which can be completed, and the obvious untruth can be fixed too. README inserting is done in the same way as the data, so please follow the contributing file guidelines CONTRIBUTING.md https://github.com/clarin-eric/ParlaMint/blob/main/CONTRIBUTING.md

Untruths: NOT UNTRUTHS ;-)

  • stanza was not used for annotations - Stanza was not used for annotations in ParlaMint-ES.v3.0 but it was used for ParlaMint-ES-2.1. At any rate,I was waiting to check when you finished the annotation. So I will now just say, it was annotated with UDPipe for ParlaMint.es-v.3.0
  • corpus specific metadata section is misleading https://github.com/calzada/ParlaMint/tree/calzada-patch-1-1/ Data/ParlaMint-ES#corpus-specific-metadata https://github.com/calzada/ParlaMint/tree/calzada-patch-1-1/Data/ParlaMint-ES#corpus-specific-metadata
  • the conversion to TEI was done at the end of your pipeline, I am not aware of any other quality contro.YES MONICA REVISED OUR XML FILES TO MAKE SURE CERTAIN MISTAKES WERE ERADICATED. IN FACT MONICA USED CHATGPT TO THAT EFFECT.
  • I was engaged only in the last step (government members gathering, conversion to TEI, and lingv. annotations), nothing else. WELL, YOU DID A LOT OF WORK.

Possible extension:

  • government members' acquisition: WHAT DO YOU MEAN BY THIS?
  • you can explain why the chairman's speeches are not affiliated with the exact person: OK I WILL EXPLAIN THIS.

If you want to include original ECPC XML it would probably be better to describe the conversion to this format first and next to describe the conversion to ParlaMint TEI:

  • a process of conversion
  • what has not been encoded (change party to group, constituencies of all MPs ). WELL, IT ISA SHAME BECAUSE WE DID HAVE ALL THIS INFORMATION, BUT WE USED TOMAZ ERJAVECˋS CONVERSION SINCE TIME WAS TIGHT. NEXT VERSION WILL INCLUDE THIS EASILY.
  • what has been newly added (government members, lingv. annotations, coalition/opposition,...): GOVERNMENT MEMBERS WERE ALREADY ADDED IN PARLAMINT-ES-v-2.1// lingv annotations were added in ParlaMInt-ES-2.1//COALITION/OPPOSITION was there in ParlaMint-ES-2.1. SO THESE ARENOT NEW INFORMATION ITEMS???

— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/pull/723#issuecomment-1670125068, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AREUX5JM2B3CROS3SNFLXUKC2DANCNFSM6AAAAAA3IPMSKQ . You are receiving this because you were mentioned.Message ID: @.***>

matyaskopp commented 1 year ago
  • stanza was not used for annotations
  • Stanza was not used for annotations in ParlaMint-ES.v3.0 but it was used for ParlaMint-ES-2.1. At any rate,I was waiting to check when you finished the annotation. So I will now just say, it was annotated with UDPipe for ParlaMint.es-v.3.0

Not only UDPipe, but also NameTag. See: https://github.com/matyaskopp/ParlaMint/blob/6fa360b0d7986319a93e3f801ecbe6ea3d880038/Data/ParlaMint-ES/ParlaMint-ES.ana.xml#L149-L158

         <appInfo>
            <application ident="UDPipe" version="2">
               <label>UDPipe 2 (spanish-ancora-ud-2.10-220711 model)</label>
               <desc xml:lang="en">POS tagging, lemmatization and dependency parsing done with UDPipe 2 (<ref target="http://ufal.mff.cuni.cz/udpipe/2">http://ufal.mff.cuni.cz/udpipe/2</ref>) with spanish-ancora-ud-2.10-220711 model</desc>
            </application>
            <application ident="NameTag" version="2">
               <label>NameTag 2 (spanish-conll-200831 model)</label>
               <desc>Name entity recognition done with NameTag 2 (<ref target="http://ufal.mff.cuni.cz/nametag/2">http://ufal.mff.cuni.cz/nametag/2</ref>) with spanish-conll-200831 model.</desc>
            </application>
         </appInfo>

And you can also insert lindat acknowledgements to fulfil the terms of use of lindat tools:

[The work described herein] has [also]* been using [data/tools/services]* provided by 
the LINDAT/CLARIAH-CZ Research Infrastructure (https://lindat.cz), supported by 
the Ministry of Education, Youth and Sports of the Czech Republic (Project No. LM2023062).
  • the conversion to TEI was done at the end of your pipeline, I am not aware of any other quality control.

YES MONICA REVISED OUR XML FILES TO MAKE SURE CERTAIN MISTAKES WERE ERADICATED. IN FACT MONICA USED CHATGPT TO THAT EFFECT.

That sounds interesting, it can be mentioned in the documentation and ideally supported by an example where chatgpt helps, and highlight that the final word has a human not AI, so you did not introduce more noise in the data.

  • government members' acquisition:

WHAT DO YOU MEAN BY THIS?

That the complete information about the members of the government is not present in CD format:

I used wget to download wiki pages and a script for extracting information from html table to TEI: gov-wiki2tei.pl

  • what has not been encoded (change party to group, constituencies of all MPs ).

WELL, IT ISA SHAME BECAUSE WE DID HAVE ALL THIS INFORMATION, BUT WE USED TOMAZ ERJAVECˋS CONVERSION SINCE TIME WAS TIGHT. NEXT VERSION WILL INCLUDE THIS EASILY.

I am unsure if it is easy because you should also include the relationship between the party and the parliamentary group. The parliamentary groups also need a full definition (not only abbreviation). We will see what you can do in next version.

  • what has been newly added (government members, lingv. annotations, coalition/opposition,...):

GOVERNMENT MEMBERS WERE ALREADY ADDED IN PARLAMINT-ES-v-2.1// lingv annotations were added in ParlaMInt-ES-2.1//COALITION/OPPOSITION was there in ParlaMint-ES-2.1. SO THESE ARENOT NEW INFORMATION ITEMS???

Sorry I meant the difference between original ECPC XML and ParlaMint-ES And I haven't seen government members in ParlaMint-ES 2.1. (no prime minister, no minister)

Final question. I do not have my Github DEsktop and this is why I am having problems updating the documentation. I cannot work the way I normally work since i have a tablet here. I have to update the document online. But then I am forced to pull a request. Is this alright? Does anyone checks on my request?

update read at the place where you did it now. I will insert it together with this pull request: https://github.com/clarin-eric/ParlaMint/pull/692

TomazErjavec commented 1 year ago

Am closing this pull request, I think it is no longer relevant.