DARIAH-ERIC / lexicalresources

Data space of the DARIAH Lexical Resources Working Group
https://dariah-eric.github.io/lexicalresources/
BSD 2-Clause "Simplified" License
18 stars 24 forks source link

Missing element: <authority> (header: publicationStmt) #171

Closed anacastrosalgado closed 1 year ago

anacastrosalgado commented 1 year ago

element "authority" contained by header: publicationStmt is missing

         <!-- [...] -->
         <publicationStmt>
            <publisher>MorDigital Project (PTDC/LLT-LIN/6841/2020)</publisher>
            <pubPlace>Lisboa</pubPlace>
            <date>2021-2023</date>
            <authority role="sponsor">FCT – Fundação para a Ciência e Tecnologia</authority>
            <availability>
               <licence target="https://creativecommons.org/licenses/by/4.0/">
                  <p>Creative Commons Attribution 4.0 International (CC BY 4.0)</p>
               </licence>
            </availability>
         </publicationStmt>
         <!-- [...] -->

element "authority" not allowed here; expected element "availability", "date", "idno", "pubPlace" or **"ref"

ttasovac commented 1 year ago

Ana, authority is not missing, it's just that the content model of publicationStmt requires that things go in a certain order: just put authority after publisher and before pubPlace, and you'll be good to go. If that fixes things, please close this issue. Otherwise, let me know.

anacastrosalgado commented 1 year ago

We changed this on one of our calls. It's different now and validated. I will close this issue, but if you can, take a look at the comments ( <!--).

         <!-- [...] -->
<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title xml:lang="en" type="full">Morais Dictionary (1st ed., 1789): digital edition</title>
         </titleStmt>
         <editionStmt>  
            <edition>MorDigital Project (PTDC/LLT-LIN/6841/2020)</edition>
            <respStmt>
               <resp>Principal researcher</resp>
               <persName>Rute Costa</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
            </respStmt>
            <respStmt>
               <resp>Team</resp>
               <persName>Rute Costa</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <persName>Sara Carvalho</persName>
               <orgName n="1">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <orgName n="2">CLLC, Centro de Línguas, Literaturas e Culturas da Universidade de Aveiro</orgName>
               <persName>Ana Salgado</persName>
               <orgName n="1">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <orgName n="2">Academia das Ciências de Lisboa, Instituto de Lexicologia e Lexicografia da Língua Portuguesa</orgName>
               <persName>Bruno Almeida</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <persName>Margarida Ramos</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <persName>Raquel Silva</persName>
               <orgName n="1">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <orgName n="2">VOH.CoLAB, Value for Health CoLAB</orgName>
               <persName>Alexandre Carreira</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <persName>Joana Oliveira</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <persName>Fahad Khan</persName>
               <orgName>Istituto Di Linguistica Computazionale ‘A. Zampolli’</orgName>
               <persName>Laurent Romary</persName>
               <orgName>Inria-ALMAnaCH Lab</orgName>
               <persName>Mohamed Khemakhem</persName>
               <orgName>Inria-ALMAnaCH Lab + Université Grenoble Alpes</orgName>
               <persName>Toma Tasovac</persName>
               <orgName n="1">DARIAH-EU, Digital Research Infrastructure for the Arts and Humanities</orgName>
               <orgName n="2">BCDH, Belgrade Center for Digital Humanities</orgName>
            </respStmt>
            <respStmt>
               <resp>Consultants</resp>
               <persName>Maria Filomena Gonçalves</persName>
               <persName>Jorge Gracia</persName>
            </respStmt>
            <respStmt>
               <resp>OCR tasks done by</resp>
               <persName>Alexandre Carreira</persName>
               <persName>Margarida Ramos</persName>
               <persName>Joana Oliveira</persName>
            </respStmt>
            <respStmt>
               <resp>XML encoding by</resp>
               <persName>Ana Salgado</persName>
               <persName>Bruno Almeida</persName>
               <persName>Toma Tasovac</persName>
            </respStmt>
         </editionStmt>
         <publicationStmt>
            <authority role="sponsor">FCT – Fundação para a Ciência e Tecnologia</authority>
            <pubPlace>Lisboa</pubPlace>
            <date>2021-2023</date>
            <availability>
               <licence target="https://creativecommons.org/licenses/by/4.0/">
                  <p>Creative Commons Attribution 4.0 International (CC BY 4.0)</p>
               </licence>
            </availability>
         </publicationStmt>
         <sourceDesc>
            <biblStruct>
               <monogr>
                  <title level="m" type="main">Diccionario da lingua portugueza composto pelo padre
                     D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva,
                     natural do Rio de Janeiro</title>
                  <title level="m" type="sub">A – K</title>
                  <author>
                     <persName>
                        <forename>António de</forename>
                        <surname>Morais Silva</surname>
                     </persName>
                  </author>
                  <imprint>
                     <pubPlace>Lisboa</pubPlace>
                     <publisher>Officina de Simão Thaddeo Ferreira</publisher>
                     <pubPlace>Lisboa</pubPlace>
                     <date>1789</date>
                     <note>Com Licença da Real Meza da Comissão Geral, sobre o Exame, e Censura dos
                        Livros.</note>
                     <!-- Please confirm distributor: describes the store where the dictionary was for sale. -->
                     <distributor>Vende-ſe na loja de Borel Borel, e Companhia, quaſi defronte da
                        Igreja nova de Noſſa Senhora dos Martyres, na eſquina.</distributor>
                  </imprint>
                  <extent>Tomo primeiro</extent>
                  <extent>752 pp.</extent>
               </monogr>
            </biblStruct>
            <biblStruct>
               <monogr>
                  <title level="m" type="main">Diccionario da lingua portugueza composto pelo padre
                     D. Rafael Bluteau, reformado, e accrescentado por Antonio de Moraes Silva,
                     natural do Rio de Janeiro</title>
                  <title level="m" type="sub">L – Z</title>
                  <author>
                     <persName>
                        <forename>António de</forename>
                        <surname>Morais Silva</surname>
                     </persName>
                  </author>
                  <imprint>
                     <publisher>Officina de Simão Thaddeo Ferreira</publisher>
                     <pubPlace>Lisboa</pubPlace>
                     <date>1789</date>
                  </imprint>
                  <extent>Tomo segundo</extent>
                  <extent>541 pp.</extent>
               </monogr>
            </biblStruct>
         </sourceDesc>
      </fileDesc>
      <encodingDesc>
         <projectDesc>
            <!-- To be approved. -->
            <p>We already had access to OCR’ed versions of the dictionary editions at the beginning
               of the project. These files needed to be post-corrected. For this, we decided to use
               ABBYY FineReader.</p>
         </projectDesc>
         <editorialDecl>
            <!-- To be approved. -->
            <p>Original spelling and typography is retained.</p>
            <p>Errors found in original OCR were all controlled.</p>
         </editorialDecl>
         <!-- Hierarchical usage labels: includes only Medicine domain label -->
         <classDecl>
            <taxonomy xml:id="domain">
               <category xml:id="domain.medical_and_health_sciences">
                  <catDesc xml:lang="en">Medical_and Health Sciences</catDesc>
                  <catDesc xml:lang="pt">Ciências Médicas e da Saúde</catDesc>
                  <category xml:id="domain.medical_and_health_sciences.medicine">
                     <!-- In MORAIS, Med./(t.)Medico -->
                     <catDesc xml:lang="en">Medicine</catDesc>
                     <catDesc xml:lang="pt">Medicina</catDesc>
                  </category>
               </category>
            </taxonomy>
         </classDecl>
      </encodingDesc>
      <profileDesc>
         <langUsage>
            <language role="objectLanguage" ident="pt">Portuguese</language>
            <language role="workingLanguage" ident="en">English</language>
         </langUsage>
      </profileDesc>
   </teiHeader>
   <text>
      <body>
         <!-- Different sections start here -->
         <div type="section" n="1">
            <p>Foi taxado eſte Livro em papel a dous mil reis. Meza 8 de Junho de 1789.</p>
            <p><hi rend="italic">Com tres rubricas.</hi></p>
         </div>
         <!-- [...] -->
      </body>
   </text>
</TEI>
daliboris commented 1 year ago

I was thinking about repetition and economy. Would it be possible to use the following notation as well?

I know that TEI Lex-0 tries to have all the information (about the entry) in one place (in the case of sharing), but in the case of the header, this principle is followed, even if referencing with @sameAs is used.

   <editionStmt>  
    <edition>MorDigital Project (PTDC/LLT-LIN/6841/2020)</edition>
    <respStmt>
     <resp>Principal researcher</resp>
     <persName>Rute Costa</persName>
     <orgName xml:id="org.nova-clunl">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
    </respStmt>
    <respStmt>
     <resp>Team</resp>
     <persName>Rute Costa</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName>Sara Carvalho</persName>
     <orgName n="1" sameAs="#org.nova-clunl" />
     <orgName n="2">CLLC, Centro de Línguas, Literaturas e Culturas da Universidade de Aveiro</orgName>
     <persName xml:id="per.ana-salgado">Ana Salgado</persName>
     <orgName n="1" sameAs="#org.nova-clunl" />
     <orgName n="2">Academia das Ciências de Lisboa, Instituto de Lexicologia e Lexicografia da Língua Portuguesa</orgName>
     <persName xml:id="per.bruno-almeida">Bruno Almeida</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName xml:id="per.margarida-ramos">Margarida Ramos</persName>
     <orgName sameAs="#org.nova-clunl"/>
     <persName>Raquel Silva</persName>
     <orgName n="1" sameAs="#org.nova-clunl" />
     <orgName n="2">VOH.CoLAB, Value for Health CoLAB</orgName>
     <persName xml:id="per.alexandre-carreira">Alexandre Carreira</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName xml:id="per.joana-oliveira">Joana Oliveira</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName>Fahad Khan</persName>
     <orgName>Istituto Di Linguistica Computazionale ‘A. Zampolli’</orgName>
     <persName>Laurent Romary</persName>
     <orgName xml:id="org.inria-almanach-lab">Inria-ALMAnaCH Lab</orgName>
     <persName>Mohamed Khemakhem</persName>
     <!-- Why not 
      <orgName sameAs="#org.inria-almanach-lab" />
      <orgName>Université Grenoble Alpes</orgName>
      ?
     -->
     <orgName>Inria-ALMAnaCH Lab + Université Grenoble Alpes</orgName>
     <persName xml:id="per.toma-tasovac">Toma Tasovac</persName>
     <orgName n="1">DARIAH-EU, Digital Research Infrastructure for the Arts and Humanities</orgName>
     <orgName n="2">BCDH, Belgrade Center for Digital Humanities</orgName>
    </respStmt>
    <respStmt>
     <resp>Consultants</resp>
     <persName>Maria Filomena Gonçalves</persName>
     <persName>Jorge Gracia</persName>
    </respStmt>
    <respStmt>
     <resp>OCR tasks done by</resp>
     <persName sameAs="#per.alexandre-carreira" />
     <persName sameAs="#per.margarida-ramos" />
     <persName sameAs="#per.joana-oliveira" />
    </respStmt>
    <respStmt>
     <resp>XML encoding by</resp>
     <persName sameAs="#per.ana-salgado" />
     <persName sameAs="#per.bruno-almeida" />
     <persName sameAs="#per.toma-tasovac" />
    </respStmt>
   </editionStmt>
daliboris commented 1 year ago

And just to be sure: notice the typo in catDesc

 <!-- typo in English: Medical_and >> Medical and  -->
 <catDesc xml:lang="en">Medical and Health Sciences</catDesc>
laurentromary commented 1 year ago

I must say I don't like the flat list under <respStmt> because the resulting information is completely unstructured: you need to rely on XML order to associate the right affiliation with the right name. I would have as many <respstmt> as need be.

bansp commented 1 year ago

I second what Laurent says and note that "Team" is not the proper value for <resp>, which should "contain a phrase describing the nature of a person's intellectual responsibility, or an organization's role in the production or distribution of a work" [spec] whereas here it's a heading.

anacastrosalgado commented 1 year ago

And just to be sure: notice the typo in catDesc

 <!-- typo in English: Medical_and >> Medical and  -->
 <catDesc xml:lang="en">Medical and Health Sciences</catDesc>

Fixed. Thanks. <catDesc xml:lang="en">Medical and Health Sciences</catDesc>

anacastrosalgado commented 1 year ago

@laurentromary and @bansp OK. But is it enough to put just the role of each researcher, right? Example:

         <editionStmt>  
            <edition>MorDigital Project (PTDC/LLT-LIN/6841/2020)</edition>
            <respStmt>
               <resp>Responsabile researcher</resp>
               <persName>Rute Costa</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
            </respStmt>
            <respStmt>
               <resp>Responsabile co-researcher</resp>
               <persName>Sara Carvalho</persName>
               <orgName n="1">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <orgName n="2">CLLC, Centro de Línguas, Literaturas e Culturas da Universidade de Aveiro</orgName>
            </respStmt>
[...]
anacastrosalgado commented 1 year ago

I was thinking about repetition and economy. Would it be possible to use the following notation as well?

I know that TEI Lex-0 tries to have all the information (about the entry) in one place (in the case of sharing), but in the case of the header, this principle is followed, even if referencing with @sameas is used.

   <editionStmt>  
    <edition>MorDigital Project (PTDC/LLT-LIN/6841/2020)</edition>
    <respStmt>
     <resp>Principal researcher</resp>
     <persName>Rute Costa</persName>
     <orgName xml:id="org.nova-clunl">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
    </respStmt>
    <respStmt>
     <resp>Team</resp>
     <persName>Rute Costa</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName>Sara Carvalho</persName>
     <orgName n="1" sameAs="#org.nova-clunl" />
     <orgName n="2">CLLC, Centro de Línguas, Literaturas e Culturas da Universidade de Aveiro</orgName>
     <persName xml:id="per.ana-salgado">Ana Salgado</persName>
     <orgName n="1" sameAs="#org.nova-clunl" />
     <orgName n="2">Academia das Ciências de Lisboa, Instituto de Lexicologia e Lexicografia da Língua Portuguesa</orgName>
     <persName xml:id="per.bruno-almeida">Bruno Almeida</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName xml:id="per.margarida-ramos">Margarida Ramos</persName>
     <orgName sameAs="#org.nova-clunl"/>
     <persName>Raquel Silva</persName>
     <orgName n="1" sameAs="#org.nova-clunl" />
     <orgName n="2">VOH.CoLAB, Value for Health CoLAB</orgName>
     <persName xml:id="per.alexandre-carreira">Alexandre Carreira</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName xml:id="per.joana-oliveira">Joana Oliveira</persName>
     <orgName sameAs="#org.nova-clunl" />
     <persName>Fahad Khan</persName>
     <orgName>Istituto Di Linguistica Computazionale ‘A. Zampolli’</orgName>
     <persName>Laurent Romary</persName>
     <orgName xml:id="org.inria-almanach-lab">Inria-ALMAnaCH Lab</orgName>
     <persName>Mohamed Khemakhem</persName>
     <!-- Why not 
      <orgName sameAs="#org.inria-almanach-lab" />
      <orgName>Université Grenoble Alpes</orgName>
      ?
     -->
     <orgName>Inria-ALMAnaCH Lab + Université Grenoble Alpes</orgName>
     <persName xml:id="per.toma-tasovac">Toma Tasovac</persName>
     <orgName n="1">DARIAH-EU, Digital Research Infrastructure for the Arts and Humanities</orgName>
     <orgName n="2">BCDH, Belgrade Center for Digital Humanities</orgName>
    </respStmt>
    <respStmt>
     <resp>Consultants</resp>
     <persName>Maria Filomena Gonçalves</persName>
     <persName>Jorge Gracia</persName>
    </respStmt>
    <respStmt>
     <resp>OCR tasks done by</resp>
     <persName sameAs="#per.alexandre-carreira" />
     <persName sameAs="#per.margarida-ramos" />
     <persName sameAs="#per.joana-oliveira" />
    </respStmt>
    <respStmt>
     <resp>XML encoding by</resp>
     <persName sameAs="#per.ana-salgado" />
     <persName sameAs="#per.bruno-almeida" />
     <persName sameAs="#per.toma-tasovac" />
    </respStmt>
   </editionStmt>

Help needed :) @ttasovac // @laurentromary What do you think?

laurentromary commented 1 year ago

Can you together your question again, I ma not sure what you wanted to ask. Thanks!

anacastrosalgado commented 1 year ago

@laurentromary 1st question: @laurentromary and @bansp OK. But is it enough to put just the role of each researcher, right? Example:

         <editionStmt>  
            <edition>MorDigital Project (PTDC/LLT-LIN/6841/2020)</edition>
            <respStmt>
               <resp>Responsabile researcher</resp>
               <persName>Rute Costa</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
            </respStmt>
            <respStmt>
               <resp>Responsabile co-researcher</resp>
               <persName>Sara Carvalho</persName>
               <orgName n="1">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
               <orgName n="2">CLLC, Centro de Línguas, Literaturas e Culturas da Universidade de Aveiro</orgName>
            </respStmt>
[...]
anacastrosalgado commented 1 year ago

@laurentromary 2nd question

If I understood well, Boris suggests using @sameas instead of repeating all the information. For me, it's more clear to describe all data. @daliboris Correct me if I'm wrong.

laurentromary commented 1 year ago

We need to ponder on the added value of using such an attribute (with the overhead of extra computational costs) and duplicating the information.

bansp commented 1 year ago

@laurentromary and @bansp OK. But is it enough to put just the role of each researcher, right? Example:

It is enough to put one, but you can add more. It's all up to you and the structure of responsibility in your project.

A minor note is that the default semantics within respStmt is that you link the type of responsibility to agents that bear that type of responsibility, thus:

responsibility A:
                type(s) : name(s)

responsibility B:
                type(s) : name(s)

This means that the organisation(s) you mention are treated as agents of responsibility -- so, to me,

<respStmt>
               <resp>Responsabile researcher</resp>
               <persName>Rute Costa</persName>
               <orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
</respStmt>

says that the role of "responsible researcher" (principle researcher?) was performed by (1) Rute Costa and (2) NOVA CLUNL.

In other words, the 'org' information is, by default, further information about the type of responsibility, rather than information about the person mentioned above.

I'm not sure if people often override the default semantics in the way you have suggested. I am not at this moment able to point at such a case, but that doesn't need to mean much. Recall that the header is where you want to provide a specific bit of information in a specific place. Those organisations might be funders, or maybe affiliations that go with names are better stated in, say, the project description section. I'd advise consulting headers produced by some well-established academia-based projects.

Arguably, you could try

<respStmt>
               <resp>Responsabile researcher</resp>
               <name><persName>Rute Costa</persName> (<orgName>NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>)</name>
</respStmt>

... but I'm not entirely sure how kosher that is. There's an implication that an agent's name consists of a personal name and the name of their institution(s). That might be acceptable, I guess.

daliboris commented 1 year ago

Sorry, my mistake reagarding @sameAs attribute: if we use this attribute, the content of the element should be identical. If we want just virtual copy of the content, @copyOf attribute is needed.

See https://tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SAIE:

The sameAs attribute may be used to document the fact that two elements have identical content. It may be regarded as a special kind of link. It should only be attached to an element with identical content to that which it targets, or to one the content of which clearly designates it as a repetition, such as the word repeat or bis in the representation of the chorus of a song, the second time it is to be sung. The relation specified by the sameAs attribute is symmetric: if a chorus is repeated three times and each repetition bears a sameAs attribute indicating the first occurrence of the element concerned, it is implied that each chorus is identical, and there is no need for the first occurrence to specify any of its copies.

The copyOf attribute is used in a similar way to indicate that the content of the element bearing it is identical to that of another. The difference is that the content is not itself repeated. The effect of this attribute is thus to create a virtual copy of the element indicated.

There are two possible ways of encoding:

  1. more convenient for @anacastrosalgado, I hope (may be prone to errors):
<orgName xml:id="org.nova-clunl">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
....
<persName>Rute Costa</persName>
 <orgName sameAs="#org.nova-clunl">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
  1. less typing, more postprocessing (may be prone to errors too :-):
<orgName xml:id="org.nova-clunl">NOVA CLUNL, Centro de Linguística da Universidade NOVA de Lisboa</orgName>
....
<persName>Rute Costa</persName>
 <orgName copyOf="#org.nova-clunl" />
bansp commented 1 year ago

I'm guessing that one of the pertinent questions is, which is more harmful: more bytes used and a potential typo in a name of an institution OR a typo in the reference identifier (they aren't simple, but then, some editors help filling the values out). I guess that the answers may vary.

daliboris commented 1 year ago

If we use @sameAs attribute, we can check (via XSLT, XQuery etc.), if there is a typo in the repeated name.

anacastrosalgado commented 1 year ago

I tried to consult other headers, but none solved our issues. The headers that I found are very concise. I have to think.