calzada / PARLAMINT-ES-MC

2 stars 4 forks source link

Fix root file #3

Closed TomazErjavec closed 3 years ago

TomazErjavec commented 3 years ago

The file https://github.com/calzada/PARLAMINT-ES-MC/blob/master/bin/ParlaMint-template.xml is the template file for your corpus root file - right now it has lots of place-holders (i.e. things simply copied from the -SI corpus, which are wrong for the -ES corpus), would it be possible for you to correct it a bit?

I wasn't really involved in specifying this for Slovenian, so I might not be able to answer questions, but I hope the various copora on the ParlaMint GitHub will be of help.

calzada commented 3 years ago

To fix my root file I need to adapt the following tags to the Spanish case

7. mandat 8. mandat

(So far) We have speeches for legislatures (mandat/MANDATE) 12, 13 and 14. In our XML this is tagged like this:

XII XIII XIV

How do I write this for the Spanish case:

7. mandat
TomazErjavec commented 3 years ago

Something like "XII mandato" I guess. Note that you will have something like:

 <meeting n="12" corresp="#FP" ana="#parla.term #FP.12">XII mandato</meeting>

and you will need to define your FP (now "federal_parliament", which is long and ugly) and FP.12 in the taxonomy at https://github.com/calzada/PARLAMINT-ES-MC/blob/4dd4ac5ecc244f81beb7f23415c68e9654842792/bin/ParlaMint-template.xml#L214-L229

calzada commented 3 years ago

Excellent. I will read this tomorrow. But, basically you are telling me I can use whichever LETTERS I WANT, PROVIDED I DO IT PROPERLY. So rather than FP (we do not have a federal parliament actually), I could say NP (for National Parliament) or maybe L (for legislature). Is my understanding correct?

I sent you (and Maciej and Petya) an email that I think is possibly for them. But I attached you all the some in case you need to be acquainted with my questions.

Tomaz, thanks a bunch for your greeeeeeat help. When the Coronavirus is over, you have to visit us in Spain. Best mc

TomazErjavec commented 3 years ago

you are telling me I can use whichever LETTERS I WANT, PROVIDED I DO IT PROPERLY.

Yes, this as just identifiers, and can be whatever you want. But it is nice if them make some sense, e.g. "NP" rather than "adfadf".

So rather than FP (we do not have a federal parliament actually), I could say NP (for National Parliament) or maybe L (for legislature). Is my understanding correct?

Yes, NP is fine.

I sent you (and Maciej and Petya) an email that I think is possibly for them. But I attached you all the some in case you need to be acquainted with my questions.

I'm not sure why you did that, Maciej and Petya are not very familiar with the corpus, also, we've just established issues here as the means of communications, so emails and attachements now just confuse the issue. So, let's see what they answer, but for me, I will respond to issues (and commits!) here, not to emails.

calzada commented 3 years ago

Dear Tomaz, I am really sorry to have bothered you. I did not know the procedure. I joined later the team. I asked Petya and Maciej because I understood at least Petya is responsible for the BG file I copied. Finally, good news. I have many more validated docs. From 2016 onwards. I am aiming at 2015 onwards since this is what the rest are doing. My task was for 2019 and 2020. Best for now and i really would like to apologise for my wrong-doing. Mc

El dom., 7 mar. 2021 14:41, Tomaž Erjavec notifications@github.com escribió:

you are telling me I can use whichever LETTERS I WANT, PROVIDED I DO IT PROPERLY.

Yes, this as just identifiers, and can be whatever you want. But it is nice if them make some sense, e.g. "NP" rather than "adfadf".

So rather than FP (we do not have a federal parliament actually), I could say NP (for National Parliament) or maybe L (for legislature). Is my understanding correct?

Yes, NP is fine.

I sent you (and Maciej and Petya) an email that I think is possibly for them. But I attached you all the some in case you need to be acquainted with my questions.

I'm not sure why you did that, Maciej and Petya are not very familiar with the corpus, also, we've just established issues here as the means of communications, so emails and attachements now just confuse the issue. So, let's see what they answer, but for me, I will respond to issues (and commits!) here, not to emails.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/calzada/PARLAMINT-ES-MC/issues/3#issuecomment-792281389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2ARESFX6GYVGVJ5QTGVDDTCN7B5ANCNFSM4YW462IA .

TomazErjavec commented 3 years ago

I asked Petya and Maciej because I understood at least Petya is responsible for the BG file I copied.

Ah, I see. But actually, it is Kiril that did the BG corpus. Sorry from my side, if I was a bit brisk, it's just that yours is just one of the corpora, and I am finding it difficult already to keep up with where various developers report their problems, so I would like to keep things as simple as possible, i.e. 1 channel for 1 corpus.

***Indeed. You were not brisk. YOU ARE TOTALLY RIGHT. I TOTALLY UNDERSTAND YOUR POSITION. And thanks for your help. I DO APPRECIATE YOUR LOAD AND THE QUALITY OF YOUR HELP. I will do my best to catch up as efficiently as possible. TOMAZ, LET ME REPEAT AGAIN THAT YOU ARE TOO GOOD TO BE TRUE (this is not the first time I tell you this). And if I do things wrongly, please let me know. I am here to produce efficient work and learn.

It would be a good idea if you looked not only at BG, but also at some other corpora (CZ, PL, SI, IS are all essentially finished), so you also see the differences between them - BG is just one particular case.

*** I will do so right away. Off I go... after I finish this email.

Nice to hear you have further texts, pls. feel free to put them in the CD folder (main branch).

Shall I just dump them there or do you want me to create a subfolder for the new files?

As for your questions from the mail, I answer them here:

Before starting (and for consistency reasons), let me tell you, I have put all Spanish versions before the English versions (if this is not what I had to do, let me know).

This is fine.

*** Excellent

(Tomaz, my work is in the adapting root branch in Parlamint-ES-MC).

You mean that the template is commited to the main branch? That is perfect, thanks.

*** Well, I put it a provisional/secondary branch ("adapting root") because it it not finished. Can you see this secondary branch?

  1. Regarding TITLE: Since we are funded by the Spanish Ministry of Science and Innovation, could I have this as title? <title type="main" xml:lang="es">Corpus ECPC-CD/ParlaMint-ES, con intervenciones del Congreso de los Diputados de España [ParlaMint SAMPLE]</title>

No, the main title is the same for all the languages, so it should stay the way it is - it would be nice if you also translated it to Spanish though. However, you can put whatever you want in the sub-title.

*** Excellent. I will check other languages to see TEI format of subheadings. Let's see if they have it.

  1. Regarding MANDATES: To fix my root file I need to adapt the following tags to the Spanish case <meeting n="7" corresp="#DZ" ana="#parla.term #DZ.7">7. mandat</meeting><meeting n="8" corresp="#DZ" ana="#parla.term #DZ.8">8. mandat</meeting>

Yes, we already discussed this.

*** Indeed, we did.

3.- FUNDING <funder> Since we are funded by the Ministry (and we are aided by CLARIN), Can I do this?

<funder>
<orgName xml:lang="es">CLARIN infraestructura de investigación científica</orgName>
<orgName xml:lang="en">The CLARIN research infrastructure</orgName>
</funder>
<funder>
<orgName xml:lang="es">Ministerio de Ciencia e Innovación</orgName>
<orgName xml:lang="en">Ministry of Science and Innovation from Spain</orgName>

Sure. Have a look at some other corpora, they have similar situations.

*** Excellent

4.- EXTENT Do you need the unit to be the speech? Our unit is the day session and I have reflected this in the following way:

Don't worry about this, I can do automatically at the end.

*** Excellent, excellent, excellent

5.- Regarding Project Desc Could we add information about our project (since we are funded by the Spanish Ministry of Science and Innovation like [having two projectDesc]?

So far the schema allowed only one projectDesc, but I've just changed it to allow more than one (it makes sense), so, yes, you can put 2 projectDescs (and don't forget to pull the new commit, before pushing, so you don't get conflicts!).

*** Indeed. I am the Queen of GitHub conflicts.

6.- On terms: I have defined terms in Spanish, but I am quoting definitions from a prestigious source. Could I do it this way:

<catDesc xml:lang="es"><term>LEGISLATURA</term>: "periodo temporal para el que es elegida la Cámara, desde las elecciones o sesión constitutiva (según secto de la doctrina) hasta la disolución anticipada o por fecha de expiración del mandato". (Diccionario panhispánico del español jurídico).</catDesc>

Well, have a look at how the others did it, e.g. https://github.com/clarin-eric/ParlaMint/blob/ca5a45ff5263da4d9b050388807a4706f16fff24/ParlaMint-IS/ParlaMint-IS.xml#L123

So, "Legislature", not all caps, no quotes, and maybe a somewhat shorter description (or you can leave it). So, something like:

<catDesc xml:lang="es"><term>Legislatura</term>: periodo temporal para el que es elegida la Cámara, desde las elecciones o sesión constitutiva (según secto de la doctrina) hasta la disolución anticipada o por fecha de expiración del mandato (Diccionario panhispánico del español jurídico).</catDesc>

And you should also do the translation to English.

*** Excellent

  1. On terms: avoiding sexism Could we have this for the Chair (instead of current wording)? <catDesc xml:lang="en"><term>Chair</term>: chairperson of a meeting</catDesc>

Sorry, but this should be the same in all the corpora, so, at least for now, pls. leave it as it is. If you want, you can of course post an issue to ParlaMint GitHub, and then we could change it for all the corpora. (although I wonder if there is ever a chairwoman at any of the meetings! We could check..).

*** No probs.

8.- I have used a mark like <!-- I HAVE NOT DONE ANYTHING FROM HERE ONWARDS BECAUSE I AM UNSURE ABOUT WHAT YOU WANT BELOW -->

OK.

Could you let me know what to do next?

Well, pls. fix the stuff above, and then continue from your mark. Once we have the root teiHeader done, there is also the more complicated situation with the component headers, as well as other stuff. But we will get to that! One step at a time...

*** Excellent. No time to looseeeee.

A BIG THANKS, TOMAZ, FOR YOUR WORK AND ALSO FOR YOUR PATIENCE. mc

calzada commented 3 years ago

Q&A

[Check "adapting_root" BRANCH in folder bin/Parlamint-template-ES.xml]

  1. Regarding mmandates: Is this alright? (i HAVE DEFINED EVERYTHING BELOW) Minutes from the National Congress in Spain, terms 11, 12, 13, 14 (2016 - 2020)
        <meeting n="11" corresp="#CD" ana="#parla.term #CD.11">Term 11</meeting>
        <meeting n="12" corresp="#CD" ana="#parla.term #CD.12">Term 12</meeting>
        <meeting n="13" corresp="#CD" ana="#parla.term #CD.13">Term 13</meeting>
        <meeting n="14" corresp="#DZ" ana="#parla.term #CD.14">Term 14</meeting>

I opted for the English work "Term 11". Shall I write the Spanish equivalence? Os this OK?

  1. María CALZADA PÉREZ

    I wrote my surnames in capitals because in Spain we have two surnames (so that people can distinguish between names and surnames).

  2. http://hdl.handle.net/11356/1388

    (Will I get my own handle? Is this my own handle for ES?)

  3. Can I do this?

    This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

  4. I assume we will change this when the corpus is available:

  5. I assume we will do this when things are ready?

  6. Under this tag, shall we translate all tags or shall we pick those only applicable to our case?
  7. I take this is the date of creation of the National Assembly. In our case:
  8. Political Parties & people

Do I do this manually or are these generated with our XML tags?

TomazErjavec commented 3 years ago

[Check "adapting_root" BRANCH in folder bin/Parlamint-template-ES.xml]

It would be better if you just did it on main branch, if something goes wrong we can always check out a previous commit. I did pull this branch, but get into some weird conflicts...

Regarding mmandates: Is this alright? (i HAVE DEFINED EVERYTHING BELOW)

<title type="sub" xml:lang="en">Minutes from the National Congress in Spain, terms 11, 12, 13, 14 (2016 - 2020)</title> Term 11 Term 12 Term 13 Term 14

Not sure what those "Term 11 Term 12 Term 13 Term 14" are doing there, but otherwise ok. Well, I'd just write "terms 11-14", but that is a detail.

I opted for the English work "Term 11". Shall I write the Spanish equivalence? Os this OK?

I guess Term is ok, but don't really know. Have a look at the other corpora.

María CALZADA PÉREZ I wrote my surnames in capitals because in Spain we have two surnames (so that people can distinguish between names and surnames).

Well, it will look ugly and different from all the rest. They have two surnames also in other countries.

http://hdl.handle.net/11356/1388 (Will I get my own handle? Is this my own handle for ES?)

No, this is the handle for the complete ParlaMint.

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. I assume we will change this when the corpus is available:

Not sure where this came from, but all the other corpora have

<p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>

and the idea is that this is the licence it will be distributed under, just like our V1 corpora at http://hdl.handle.net/11356/1345

Under this tag, shall we translate all tags or shall we pick those only applicable to our case?

This I don't understand.

I take this is the date of creation of the National Assembly.

Yes, I think so, maybe have a look at the others.

In our case: Political Parties & people Do I do this manually or are these generated with our XML tags?

The actualy lists of org and person are generated.

PS: better than to write numbers, just quote what you are referring to. and use backticks around xml tags.

calzada commented 3 years ago

Dear Tomaz, I just finished Parlamint-ES.xml. Could you have a look. PLEASE NOTICE it is /bin/ParlaMint-template-ES.xml. When you uploaded your version I had already finished a lot of work. Would you be so kind as to let me know whether it is ok and would you rename it and put it in the master branch? Also... what else needs doing? Finally, regarding my new files, shall I just dump them in the CD folder or do you want me to create a subfolder for the new files? Again, a big thanks. mc

TomazErjavec commented 3 years ago

OK, thanks for https://github.com/calzada/PARLAMINT-ES-MC/blob/master/bin/ParlaMint-template-ES.xml.

Your filename is more sensible anyway. But the file is not XML, i.e. doesn't validate. Not to worry, I will fix it, but don't you have an XML editor? If not, get Oxygen editor (https://www.oxygenxml.com/download.html) for 1 month for free, by that time we will have hopefully finished! :)

Not sure I will look at the file immediatelly though.

PS: removed the hidden Mac files for the repo, you, of course, can still have them! Excellent. I now know what .gitignore is!!! Cheers.

calzada commented 3 years ago

Excellent. Thanks sooo much. more tomorrow. Best for now, mc

TomazErjavec commented 3 years ago

Great, you did a lot of work! I modified some things in https://github.com/calzada/PARLAMINT-ES-MC/blob/master/bin/ParlaMint-template-ES.xml, I hope nothing that will irritate you very much, but also left some some comments (search for "ET") with what still needs to be done - but only small things, it is almost finished, very nice!

calzada commented 3 years ago

More than Excellent. I will refine the .xml and then will come back for the next steps.

Where do you want the new files I have finished?? CD folder or a subfolder??

Best for now,

mc

TomazErjavec commented 3 years ago

I will refine the .xml and then will come back for the next steps.

OK, no great hurry, I have other things on my plate now..

Where do you want the new files I have finished?? CD folder or a subfolder??

CD folder, master branch.

calzada commented 3 years ago

I have tried to validate root file. I have put root file in a folder with validate-parlamint.pl and validate-parlamint.xsl and I got a problem with namespace and with text to be included. I suppose you have already worked on my root file. That is why I only have these two problems. Now, do I need to do anything or will this be generated when we put files. Best mc

TomazErjavec commented 3 years ago

have tried to validate root file.

You mean https://github.com/calzada/PARLAMINT-ES-MC/blob/master/ParlaMint/ParlaMint-ES.xml ?

That one validates rather nicely, the log is in https://github.com/calzada/PARLAMINT-ES-MC/blob/master/log.txt

You don't need to do it really, I will do it.

I suppose you have already worked on my root file.

Yes, of course, in 48febfd, you can see the reference to it above.

Now, do I need to do anything

If you finished with the root file, then, for the moment, no. You can relax :)

calzada commented 3 years ago

Wow. Great Sensei!! Best for now!!

El mar., 9 mar. 2021 14:25, Tomaž Erjavec notifications@github.com escribió:

have tried to validate root file.

You mean https://github.com/calzada/PARLAMINT-ES-MC/blob/master/ParlaMint/ParlaMint-ES.xml ?

That one validates rather nicely, the log is in https://github.com/calzada/PARLAMINT-ES-MC/blob/master/log.txt

You don't need to do it really, I will do it.

I suppose you have already worked on my root file.

Yes, of course, in 48febfd https://github.com/calzada/PARLAMINT-ES-MC/commit/48febfda51ac6d649719f0043afb8a4c25684e70, you can see the reference to it above.

Now, do I need to do anything

If you finished with the root file, then, for the moment, no. You can relax :)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/calzada/PARLAMINT-ES-MC/issues/3#issuecomment-793895349, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2ARESQ7GD2QTGC4XIWM3DTCYOWNANCNFSM4YW462IA .

TomazErjavec commented 3 years ago

And this is done.