calzada / PARLAMINT-ES-MC

2 stars 4 forks source link

Urgent questions while annotating #16

Closed calzada closed 3 years ago

calzada commented 3 years ago

Dear Tomaz,

Dear Tomaz,

I hope you are doing great.

Some questions/comments on annotation we need to know to try to be ready for Wednesday:

Words by Luciana:

1) I didn't indent the files, but as far as I know this could be done automatically by Tomaz. But please let me know if that would be a constraint for him.

3) Another issue I encountered is the NER annotation, for example: <name type="PER"> </name> <w lemma="Félix" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2020-02-26-CD200226.u17.1.3"> Félix </w> <name type="E-PER">``</name> <w join="right" lemma="Cucurull" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2020-02-26-CD200226.u17.1.4"> Cucurull </w>

The tag </name> should close after "Cucurull" (THIS IS ACTUALLY THE SURNAME), and the<name type="E-PER"> </name> shouldn't be here, according to the Polish annotation. This E in E-PER means it's the end of the NER. But I'd like to know if we could leave it that way?

Best for now

P.D. I hope I did the <>correctly.

TomazErjavec commented 3 years ago

I didn't indent the files, but as far as I know this could be done automatically by Tomaz. But please let me know if that would be a constraint for him.

I can do it, not a problem.

Another issue I encountered is the NER annotation... The tag should close after "Cucurull" (THIS IS ACTUALLY THE SURNAME), and the shouldn't be here, according to the Polish annotation. This E in E-PER means it's the end of the NER. But I'd like to know if we could leave it that way?

No, I'm afraid not. It should be:

<name type="PER">
  <w lemma="Félix" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2020-02-26-CD200226.u17.1.3"> Félix </w>
</name>

P.D. I hope I did the <>correctly. I think so, except the

<name type="E-PER">``</name>

is strange.

calzada commented 3 years ago

Dear Tomaz, We are annotating NER and UD with Stanza and I want to comment on a couple of issues and ask you two questions. 1) Like with the Italians and the French. We have problems with the enclitics. In our case, the problem is when there are two or more enclitics (for instance, "habértelas", which is haber + te + la ). Normally what it does is it sticks the first enclitic to the verb (as if it is part of the lemma; i.e. haberte + la). We are leaving this as it is, for the time being, because in Spanish we have different types of verbs (reflexive, pronominals etc.) and it will need manual refinement. 2) I am uploading one .ana.xml file for you to get from this repo. It is in the Parlamint folder. Could you have an eye and let us know if it looks alright. We still have a couple of issues to solve (a minor thing with the NER tagging for instance), but will do so in the coming days. 3) We still have some files with errors (not many). We will deal with them in the coming days.

QUESTIONS: 1) We are unsure about <note>xxx</note>. How should we place it in the ana.xml file. What is the format? Do you have any examples? We cannot seem to locate any. We have the problem that some of these notes are in the middle of speeches (and not in between speeches). The. .ana.xml file we uploaded has examples of <note>xxx</note>out this 2) Regarding notes, we JUST realised (after our meeting today) that some notes (those within segments) have been annotated. We did to realise about this. Look:

From the xml: <note>Aplausos.-Rumores.-Una señora diputada: ¡Qué nivel!</note>

In the -ana.xml:

<w join="right" lemma="aplauso" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2015-02-24-CD150224.u25.2.6">Aplausos</w> <w lemma=".-Rumores.-Una" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2015-02-24-CD150224.u25.2.7">.-Rumores.-Una</w>

So, our question is: Do we have to annotate everything again. (No probs. We will split the work and be ready asap)? Or is there a better solution?

IF WE HAVE NOT TO ANNOTATE AGAIN, COULD YOU GIVE US UNTIL BEGINNING OF THE WEEK SO THAT WE ARE READY? IS THERE ANYTHING ELSE YOU WANT FROM US RIGHT NOW?

Finally, please note that the best part of all of this work is thanks to Luciana de Macedo!!! Would it be possible to make her name explicit in the TEI mark-up? Best for now and I hope you are doing fine. mc

TomazErjavec commented 3 years ago

Like with the Italians and the French. We have problems with the enclitics. In our case, the problem is when there are two or more enclitics (for instance, "habértelas", which is haber + te + la ). Normally what it does is it sticks the first enclitic to the verb (as if it is part of the lemma; i.e. haberte + la). We are leaving this as it is, for the time being, because in Spanish we have different types of verbs (reflexive, pronominals etc.) and it will need manual refinement.

OK, if I understand correctly, you are not quite happy with the way these words are analysed but there is not much that you/we can do. So be it then. But I should note that the French and Italians just had problems (disagreements) with the way ParlaMint proposes to encode such syntacitc words (i.e. clitics), not with the actual processing.

I am uploading one .ana.xml file for you to get from this repo. It is in the Parlamint folder. Could you have an eye and let us know if it looks alright. We still have a couple of issues to solve (a minor thing with the NER tagging for instance), but will do so in the coming days. We still have some files with errors (not many). We will deal with them in the coming days.

I moved the file into the newly created ParlaMint.ana directory, as well as removing the space with which the filename started. I can't really validate, as somehow all tag names are here lower case, instead of camel case, as they should be, e.g. teiheader instead of teiHeader etc. Can you pls. correct this? Also, for the proper validation I will need the root .ana file (so, ParlaMint-ES.ana.xml), do you know how to make it? (it is the same as the ParlaMint-ES.xml file, but with added things, like the UD-SYN taxonomy, prefixDef, appInfo, and you can copy most of the stuff from existing ParlaMint .ana roots on GitHub).

But looking at the linguistic annotation it looks pretty good at first glance, jsut one comment:

We are unsure about <note>xxx</note>. How should we place it in the ana.xml file. What is the format?

Exactly as it was in the original, unannotated corpus.

Do you have any examples? We cannot seem to locate any.

There are lots of them in https://github.com/clarin-eric/ParlaMint/ just do grep '<note' */*.ana.xml (or grep '<note' */*/*.ana.xml) there. Assuming you are on Linux, which all cool people are:) But, specifically: https://github.com/clarin-eric/ParlaMint/blob/7f69eae1aeeda71143cb75c27bc83bcb14ea13cd/ParlaMint-CZ/ParlaMint-CZ_2013-11-25-ps2013-001-01-001-001.ana.xml#L236

We have the problem that some of these notes are in the middle of speeches (and not in between speeches). The. .ana.xml file we uploaded has examples of xxx

They can be in the middle, no problem.

Finally, please note that the best part of all of this work is thanks to Luciana de Macedo!!! Would it be possible to make her name explicit in the TEI mark-up?

Sure, but this will apply to the .ana corpus, which we don't have yet. Pls. remind me once we get there, or, given that Licana will be making this corpus, she can do it herself, like

            <respStmt>
               <persName>María Calzada Pérez</persName>
               <resp xml:lang="en">Data retrieval and conversion to XML</resp>
            </respStmt>
            <respStmt>
               <persName>Tomaž Erjavec</persName>
               <resp xml:lang="en">Conversion to ParlaMint TEI</resp>
            </respStmt>
            <respStmt>
               <persName>Luciana de Macedo</persName>
               <resp xml:lang="en">Linguistic annotation</resp>
            </respStmt>
calzada commented 3 years ago

A FINAL THING:

Regarding notes, we JUST realised (after our meeting today) that some notes (those within segments) have been annotated. We had not realised about this before. Look:

From the xml: <note>Aplausos.-Rumores.-Una señora diputada: ¡Qué nivel!</note>

In the -ana.xml:

<w join="right" lemma="aplauso" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2015-02-24-CD150224.u25.2.6">Aplausos</w> <w lemma=".-Rumores.-Una" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2015-02-24-CD150224.u25.2.7">.-Rumores.-Una</w>

So, our question is: Do we have to annotate everything again. (No probs. We will split the work and be ready asap)? Or is there a better solution?

Best

mc

calzada commented 3 years ago

Finally, thanks for your informative answer. I will try to do the root file and Luciana will check these final issues.

Best and thanks for your help... ALWAYS.

mc

TomazErjavec commented 3 years ago

Do we have to annotate everything again

I'm afraid so, yes. Notes should not be annotated.

calzada commented 3 years ago

No probs. I have just identified the part I need to add to the .ana.xml root. As soon as I have a draft I will share it with you. Thanks Tomaz Best mc

calzada commented 3 years ago

Dear Tomaz,

RE: JUST CHECKING ON HOW TO DEAL WITH NOTES. In a nutshell, in Parlamint-ES we have notes in the middle of speeches. We want to confirm the decision we have taken.

MORE INFORMATION RIGHT BELOW:

[EMAIL FROM LUCIANA DE MACEDO]

We'd like to know how to annotate when notes happen like the following (boldtyped), in which <note>'s are between sentences, rather than segments:

<seg xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1">Señor Pezzi, retire usted esa bandera.<note>Protestas.-Aplausos</note> Le llamo al orden por primera vez.<note>El señor Pezzi Cereto continúa exhibiendo la bandera de la Comunidad Autónoma de Andalucía</note> Señor Pezzi, le llamo al orden por segunda vez.<note>Protestas.-Aplausos La próxima vez va usted a la calle.<note>El señor Pezzi Cereto recoge la bandera.-Protestas.-Un señor diputado: ¡Fuera, hombre! Continúe, señor presidente.</seg>

My suggestion (for the first <note> of the above segment) would be as follows, but I'd like to make sure we're on the right track before the reannotation process:

<u ana="#chair" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24"> <seg xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1"> <s xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.1"> <name type="PER"> <w lemma="señor" msd="UPosTag=NOUN|Gender=Masc|Number=Sing" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.1">Señor</w> <w join="right" lemma="Pezzi" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.2">Pezzi</w> </name> <pc msd="UPosTag=PUNCT|PunctType=Comm" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.3">,</pc> <w lemma="retirar" msd="UPosTag=VERB|Mood=Imp|Number=Sing|Person=3|VerbForm=Fin" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.4">retire</w> <w lemma="tú" msd="UPosTag=PRON|Case=Acc,Nom|Number=Sing|Person=2|Polite=Form|PronType=Prs" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.5">usted</w> <w lemma="ese" msd="UPosTag=DET|Gender=Fem|Number=Sing|PronType=Dem" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.6">esa</w> <w join="right" lemma="bandera" msd="UPosTag=PROPN" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.7">bandera</w> <pc msd="UPosTag=PUNCT|PunctType=Peri" xml:id="ParlaMint-ES_2015-02-24-CD150224.u24.1.8">.</pc> <note>Protestas.-Aplausos</note> <linkGrp targFunc="head argument" type="UD-SYN"> [dependency relation here]</linkGrp>``</s>

We used the below script (in case Tomaz wants to check). import os dirs = os.listdir('.') dirs = [ d for d in dirs if 'ParlaMint' in d and '.xml' not in d ] files = [ dir + '/' + i for dir in dirs for i in os.listdir(dir) ] files = [ i for i in files if 'ana' not in i ] files = [ i for i in files if 'xml' in i ] files

from bs4 import BeautifulSoup def test_file(filename): with open(file, 'r') as f: parsed = BeautifulSoup(f.read(), 'html.parser') segs = parsed.find_all('seg') for seg in segs: if '<note>' in str(seg): print(seg.text)

for file in files: print(file) test_file(file)

TomazErjavec commented 3 years ago

Well, the principle is that notes can appear anywhere, even in the middle of the sentences. But, if possible, it is nicer to "lift" them out of the containing elements, if they appear at the edge of the element. In other words, it is formally ok what you propose, but would be nicer if the note were in between two sentences.

calzada commented 3 years ago

Thanks for the fast reply. You certainly deserve this!!:

https://www.youtube.com/watch?v=wo_K4KMwhLo

I am working on the .ana.xml root. My methodology is. I have compared the BG file with mine and I have copied and paste info about NER and UD-SYN (appearing BTWEEN this: <term>COVID</term>: COVID subcorpus, from 2019-11-01 onwards</catDesc> </category> </taxonomy> and this: <profileDesc> ) I am now comparing carefully. to check UD-SYN are applicable to our case (with the aid of this: https://universaldependencies.org/treebanks/es_ancora/index.html and our -ana-xml file )

Am I on the right track?

No words as usual to thank you!!!

Best mc

TomazErjavec commented 3 years ago

https://www.youtube.com/watch?v=wo_K4KMwhLo Nice!

Am I on the right track? Absolutely :)

Take care & I have to go cook lunch now.

calzada commented 3 years ago

ON .ANA.XML ROOT:

Tomaz, I am about to finish the ana.xml root. A couple of things:

1) In the root file there are cases that do not appear in our sample file (i.e. acl:relcl; aux:pass; csubj:pass; discourse; expl; goeswith; nsubj;pass; orphan; reparandum).

I am particularly concerned about reparandum (overriden disfluency; here used for program mistakes).

2) In the root file I forked from Parlamint-BG, I have added: dep = dependency (for cases when unable to determine a more precise dependency because of:

IS OUR DEP THEIR REPARANDUM?

expl:impers expl:pass expl:pv list

I3) I just need to clarify <appInfo>``</appInfo> and the .ana.xml root file will be ready. I will tell you when I THINK IT IS READY. It will be later tonight since I am talking to Luciana at 19.0O hours.

Best for now, mc

TomazErjavec commented 3 years ago

In the root file there are cases that do not appear in our sample file

But they might appear in the complete corpus? For now, I would just leave everything there, it is easy to remove them at the end, when we can do a statistic on what relations you actually have in the full corpus.

I am particularly concerned about reparandum (overriden disfluency; here used for program mistakes).

Don't be worried - we used this for the annotation pipeline mistakes in the Slovenian corpus, everybody else just copied this..

dep = dependency

OK, others have this as well. You just leave dep, it is maybe a better choice than reparandum anyway.

I will tell you when I THINK IT IS READY.

OK. Pls. put it in the ana directory.

calzada commented 3 years ago

Dear Tomaz,

Please find Parlamint/Parlamint-ES.ana.xml. I think this is ready.

Notice my methodology: 1) I used as main example Parlamint-BG.ana.xml (since this is what I did for Parlamint-ES.xml) 2) I used Parlamint-GB.ana.xml, which is rather complete (BG sometimes does not have proper rewording of abbreviations) 3) I used https://universaldependencies.org/treebanks/es_ancora/index.html (referring to the Spanish treebank we used for UD). 4) I checked all UD SYNT cases in our small sample of .ana.xml files. 5) I kept all categories (even if they were not present in our sample (to be on the safe side). 6) I added categories that are especially active in Spanish such as expletive:impers or expletive:pv (for example). 7) I added the dep category. All in all, Parlamint-ES.ana.xml is the same as Parlamint-ES.xml. with the addition of lines 23-26 (to add Luciana de Macedo's participation) and lines 407-670 (for NER and UD). Please check <appInfo>' (lines 660-669) because I am unsure of the explanation. This is what I wrote: `

Tokenisation, POS tagging, NER and dependency parsed using Stanza, a Python NLP language analysis package https://stanfordnlp.github.io/stanza/.
        <application ident="UD_Spanish-AnCora">
           <label>UD Spanish AnCora</label>
           <desc xml:lang="en">For UD, the Spanish AnCora Treebank was used <ref target="https://github.com/UniversalDependencies/UD_Spanish-AnCora">https://github.com/UniversalDependencies/UD_Spanish-AnCora</ref>.</desc>
        </application>
     </appInfo>`

If I made mistakes, let me know and I will work on them.

As we speak, we are re-annotating the corpus (to fix <note>bug). We will be ready in a couple of days.

A millions of thanks AS USUAL for your help.

mc

TomazErjavec commented 3 years ago

Hi, thanks for your work, very nice!

I corrected the root file a bit (and moved it to where it belongs, also did some file housekeeping), and now it looks quite fine, no errors as far as I can see!

To really tell, I will need some annotated files. For now, I ressurected the old ParlaMint-ES_2017-01-31-CD170131.ana.xml file and corrected it a bit (just for fun..), but it still has lots of errors, the most common one is like:

ParlaMint-ES_2017-01-31-CD170131.ana.xml:98:146: error: ID "ParlaMint-ES_2017-01-31-CD170131.u1.1.1" has already been defined
ParlaMint-ES_2017-01-31-CD170131.ana.xml:97:57: error: first occurrence of ID "ParlaMint-ES_2017-01-31-CD170131.u1.1.1"
ParlaMint-ES_2017-01-31-CD170131.ana.xml:114:151: error: ID "ParlaMint-ES_2017-01-31-CD170131.u1.2.1" has already been defined

etc. So, many IDs are defined more than once, so you need to watch out for that.

Now awaiting the new batch of .ana files (no need to put all of then on Git, just a few for testing).

Best, Tomaž

calzada commented 3 years ago

Dear Tomaz,

Minor issue (from Luciana de Macedo):

Could you ask Tomaz where and how the tag "pb" should be placed when it's within the sentence, such as "más de 122.000 millones de euros de esta manera."?

We can't seem to find a solution here. I haven't seen any tags within sentences in other languages.

From: https://github.com/calzada/PARLAMINT-ES-MC/blob/master/ParlaMint/ParlaMint-ES_2015-02-24-CD150224.xml#L129

MY SUGGESTIONS: 1) Treat them as notes 2) delete .As far as I gather they are page numbers that are unnecessary for Parlamint. 3) Leave them in the middle of the text (without annotation).

Sorry to bother you Tomaz. We are really getting there. Best mc

TomazErjavec commented 3 years ago

how the tag "pb" should be placed when it's within the sentence

Ideally it should be placed exactly as it is already place in the un-annotated version, so, option 3.

I haven't seen any tags within sentences in other languages.

The CZ uses them, cf. eg. https://github.com/clarin-eric/ParlaMint/blob/3ee80827f9b2b24c4a742b1e6eee85dcf90856f6/ParlaMint-CZ/ParlaMint-CZ_2013-11-25-ps2013-001-01-001-001.ana.xml#L123

TomazErjavec commented 3 years ago

The CZ uses them

Sorry, you are right, that is not in the middle of the sentence. But otherwise, as I wrote above.

calzada commented 3 years ago

Dear Tomaz, I hope you will forgive me. But I have given irrevocable orders that are erased. So we will not have them in the .ana.xml version. I thought it was most sensible because they are of no use. please, please forgive me. I thought it was a good order since this will allow to have everything ready by tomorrow and Luciana is really about to give birth. Best for now, mc

TomazErjavec commented 3 years ago

OK, no problem. Better to finish while still able, I agree!

calzada commented 3 years ago

Grrrreat!! Best for now, mc

El mar., 6 abr. 2021 14:43, Tomaž Erjavec @.***> escribió:

OK, no problem. Better to finish while still able, I agree!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/calzada/PARLAMINT-ES-MC/issues/16#issuecomment-814088953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2ARERGUAX2UAZUEMAIGI3THL6VNANCNFSM4Z7LHPPA .

calzada commented 3 years ago

Tomaz, I just uploaded annotated files. I uploaded all of them in case you need them. They need to be validated now. What we did with is that we deleted those in the middle of text. Please, let me know if there is anything else that needs doing. I also uploaded the scripts that were used for annotation (and bug sorting). They are in the bin directory. And I owe a great thanks to Luciana de Macedo. Best for now, mc

TomazErjavec commented 3 years ago

I just uploaded annotated files. I uploaded all of them in case you need them. They need to be validated now.

Great, thanks. One error that prevents further validation is that the files are not well-formed XML: all component files should end with </TEI> and not </tei> as they do now.

Another rather simple change to make is that all top-level IDs should end with ".ana", like:

<TEI ana="#reference" xml:id="ParlaMint-ES_2015-01-20-CD150120.ana" xml:lang="es" xmlns="http://www.tei-c.org/ns/1.0">

and that the "stamp" in the title should be `[ParlaMint.ana], e.g.:

<title type="main" xml:lang="en">Spanish parliamentary corpus ParlaMint-ES, Plenary session 237 (2015-01-20) [ParlaMint.ana]/title>
<title type="main" xml:lang="es">Corpus parlamentario en español ParlaMint-ES, Sesión plenaria núm. 237. Sesión extraordinaria (2015-01-20) [ParlaMint.ana]</title>

Can you fix this or should I?

What we did with is that we deleted those in the middle of text.

I guess you mean <pb/>? If so, it would be better to delete all of them, deleting just some just introduces noise. But, again, it is simple to just delete all of them, again, I could do that (when I find the time, which is a bit of a problem right now...)

I also uploaded the scripts that were used for annotation (and bug sorting). They are in the bin directory.

Nice, but they won't really help me, as I don't speak Python, also, I would not know how to run them (ie. there is not Makefile or similar.

calzada commented 3 years ago

Dear Tomaz, Thank you so much as usual. If it is not too much of a hassle, could you do these final issues. Luciana was having a check today and I fear it might be a better idea. If you cannot fit it in, I can try to do those issues I can fix myself and the rest I can try to find someone else. but my preference, of course, is that if you could fit it, you would be the best. What do you say? I know you are already overworked, and I feel ashamed to ask you this. Best for now, mc

TomazErjavec commented 3 years ago

could you do these final issues.

OK, will do!

calzada commented 3 years ago

I have no words to thank you enough. What can I do to thank you??? Best for now, mc

calzada commented 3 years ago

Tomaz, I had to upload the files again because some were missing. Now they are all there and you can proceed with the minor fixing and validation when you can. Best, mc

TomazErjavec commented 3 years ago

had to upload the files again because some were missing.

One is still missing, i.e. ParlaMint-ES_2020-11-18-CD201118-bis.ana.xml Is it still possible to add it?

calzada commented 3 years ago

Dear Tomaz, How are you? I hope you are more than fine. I have just enquired Luciana. I know she had some problems with 4 files so maybe something happened with this. As soon as she replies (let's keep our fingers crossed), I will keep you posted. Is there anything you need from our side? And btw, THANK YOU SO MUCH FOR YOUR WORK ALWAYS. best mc

TomazErjavec commented 3 years ago

How are you?

Well, working the whole day, but then, it was my choice :)

As soon as she replies (let's keep our fingers crossed), I will keep you posted.

OK, great. But if she won't manage, I will just delete this file, it won't be the end of the world..

Is there anything you need from our side?

Just the file. Or confirmation that it is unavailable.

Best, Tomaž

calzada commented 3 years ago

Tomaz,

Luciana thought the file was a duplicate (quite a logical deduction anyway) and she did not annotate it. She will annotate it and send it to me asap.

Regarding your words, indeed, we do appreciate all the work you produce. But I am sure you will be taking good care of yourself as well. At any rate, I should especially apologise for giving you so much work.

When the coronavirus is over, you are invited to Spain. I have a flat in Valencia (I live in Castellón but Valencia is close by) and there you can relax with your family. You can visit the city and take an urban bus that will take you by the sea. During the weekend I will invite you to paella with my partner.

Let's hope this Coronavirus finishes soon.

I am also growing increasingly intrigued by Ljubljana because of you and will visit your country in the future. I will of course not disturb you in the least, because I do realise you are a major researcher.

Best for now, and I will be sending you this final file.

best

mc

calzada commented 3 years ago

Done!! File uploaded. Best and thanks to Luciana. Grrrrrreat Luciana! mc

TomazErjavec commented 3 years ago

Dear @calzada, thanks for your bidirectional invite, both sound nice, esp. if there is life after Corona! Got the file, and substituted your .ana files with mine, eXtra polished, 03ef75b :) I made the "final" pipeline for all corpora, and the "final" ES sample is on ParlaMint. Pls. also see the conversion log, there are some by now some classic UD errors, but noting to lose sleep over. Luciana is indeed greeeeaaat - and you, of course, with your careful root file! Making the .ana was really unproblematic. Well, actually it was, but only because I found my own bugs, from when I was making the .tei version.... But I did find one mistake: 0b3a40e :)

Anyway, now ES is now also in the concordancer: Info or maybe covid.

So, with a good concience, we can now close this issue!

But I'm sure we can come up with some more 👍