erc-dharma / project-documentation

DHARMA Project Documentation
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

People names in teiHeader #242

Open michaelnmmeyer opened 8 months ago

michaelnmmeyer commented 8 months ago

There are no definite conventions for indicating people's roles. We have <respStmt>, <editor>, <principal>, <author>, etc. depending on files. In particular, critical editions (DHARMA_CritEd) do not follow the conventions adopted in inscriptions (DHARMA_INS).

I suggest we use a single notation for indicating people's roles. For instance, to indicate editors, we could either use <respStmt> or <editor>, but not both, to avoid confusion.

Inscriptions generally use <respStmt>. However, the value of respStmt/resp varies widely between files. Sometimes it contains phrases ("EpiDoc encoding", etc.), sometimes full sentences, capitalized or not. People treat it as free text (this is its purpose, according to the TEI documentation), so I cannot process it mechanically. When people use several <resp> for a single person, it is also not clear how I should merge the values (separate them with semicolons? with periods? capitalize or not? etc.)

The TEI documentation suggests to add an attribute resp/@ref to indicate people's role in a machine-readable way. See the note at https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-resp.html. From what I have seen so far, I see at least two distinctions that could be useful for search and display:

  1. The person who is encoding/editing the inscription vs. a source/past edition from which this person is drawing. I mean the distinction between Salomé Pichon and George Cœdès, for instance.
  2. The person who is doing the core editing work vs. someone that contributed ideas, a collaborator, etc.

It would also be helpful to constrain the structure of <respStmt> in such a way that the same notation is always used for representing the same data. Compare the following. A, B and C represent the same data; D is a bit different but is theoretically possible.

(A)

<respStmt>
  <resp>EpiDoc Encoding</resp>
  <persName ref="part:sapi">
     <forename>Salomé</forename>
     <surname>Pichon</surname>
  </persName>
</respStmt>
<respStmt>
  <resp>intellectual authorship of edition</resp>
  <persName ref="http://viaf.org/viaf/66465311">
     <forename>George</forename>
     <surname>Cœdès</surname>
  </persName>
  <persName ref="part:sapi">
     <forename>Salomé</forename>
     <surname>Pichon</surname>
  </persName>
</respStmt>

(B)

<respStmt>
  <resp>EpiDoc Encoding</resp>
  <persName ref="part:sapi">
     <forename>Salomé</forename>
     <surname>Pichon</surname>
  </persName>
</respStmt>
<respStmt>
  <resp>intellectual authorship of edition</resp>
  <persName ref="http://viaf.org/viaf/66465311">
     <forename>George</forename>
     <surname>Cœdès</surname>
  </persName>
</respStmt>
  <resp>intellectual authorship of edition</resp>
  <persName ref="part:sapi">
     <forename>Salomé</forename>
     <surname>Pichon</surname>
  </persName>
</respStmt>

(C)

<respStmt>
  <resp>EpiDoc Encoding</resp>
  <resp>intellectual authorship of edition</resp>
  <persName ref="part:sapi">
     <forename>Salomé</forename>
     <surname>Pichon</surname>
  </persName>
</respStmt>
<respStmt>
  <resp>intellectual authorship of edition</resp>
  <persName ref="http://viaf.org/viaf/66465311">
     <forename>George</forename>
     <surname>Cœdès</surname>
  </persName>
</respStmt>

(D)

<respStmt>
  <resp>EpiDoc Encoding</resp>
  <resp>intellectual authorship of edition</resp>
  <persName ref="part:sapi">
     <forename>Salomé</forename>
     <surname>Pichon</surname>
  </persName>
  <persName ref="http://viaf.org/viaf/66465311">
     <forename>George</forename>
     <surname>Cœdès</surname>
  </persName>
</respStmt>
danbalogh commented 8 months ago

I'll respond to this one in more detail (probably tomorrow), but before finalising the encoding, we really need to work out the kinds of responsibilities we wish to indicate in the files. See my notes in the EGD Leftovers doc, under the heading Authorship and responsibility

danbalogh commented 8 months ago

A few more comments. On the roles we should distinguish, see my proposals in the EGD Leftovers doc, under Authorship and responsibility. I should point out here that according to our long-standing (but never more than half-baked) conventions, an original editor need only be credited in the responsibility statement if the work on the DHARMA digital edition involved very little or no scholarly effort. Earlier editors are always credited in the bibliography and are relevant to the resp stmt only when they are (almost) alone responsible for the content of an edition.

On creating a reference list for responsibilities: this sounds like a very good idea and should in my opinion be implemented.

On the structure of the responsibility statement: form A looks best to me, but I have a suspicion that form B has been suggested somewhere (was it perhaps in XML comments in one of our template files?). I dislike C, and D looks uninterpretable to me. (Are both persons understood to be responsible for both roles? Or if they are to be read respectively, then how do we record one person in more than one role?)

And @michaelnmmeyer , I think you assigned the PIs and me to your original post with all the issues, but haven't assigned anyone to the separately created issues. I'm assigning myself as I look at each, but I feel it should be you who assigns the PIs to all of these issues.

michaelnmmeyer commented 8 months ago

Thank you for the details.

About the responsibilities list, I forgot to mention that we have a few uses of titleStmt/author to designate the Indian author of a text (Śāntarakṣita, Vararuci, etc.).

OK for using form A. I will do that automatically, no manual intervention is necessary.

danbalogh commented 8 months ago

Thanks, but I think before making any global automated changes, we/you should wait until Arlo, and perhaps another PI or two, have had the opportunity to think about this and contribute. Especially when it affects things that are governed by guidelines other than the EGD, as I don't know the kind of responsibilities relevant to critical editions.

arlogriffiths commented 8 months ago

Thanks both of you.

I have now gone through Dan's proposals in the EGD Leftovers doc, under Authorship and responsibility. I think I find myself in full agreement, but what is still lacking is a concrete suggestion for how to express the ideas in TEI. I suppose we will need to keep using <respStmt> to express the relevant distinctions and will need to contrain the contents that encoders may give to <resp> in order for @michaelnmmeyer to be able to process the information.

Since the number of DHARMA_CritEd files is enormously much smaller than the number of DHARMA_INS files, if (as @michaelnmmeyer seems to imply) it is desirable to follow the same scheme in both contexts it will be easier to adjust the former to the latter (once we have our act together for how to encode <respStmt>) than vice versa. We will need to modify the EGC once revision of EGD is completed.

danbalogh commented 8 months ago

For clarification, the "roles" I outline in the Leftovers are meant to correspond to responsibilities in the respStmt; responsibilities may be expressed simply by the role label or in a slightly more verbose form as we do now. Thus, for the roles that I feel pretty sure we need, we could go with: | author | intellectual authorship of the edition | | curator | curation of the edition | | encoder | EpiDoc encoding |

But there's still the whole blue section (blue meaning awaiting input and finalisation) on additional roles that we might want to consider, e.g. revision, ingestion from another corpus (=curation?), principal investigator of task force, TEI data manager, metadata collector, etc.

manufrancis commented 5 months ago

Dear Dan, thanks for this.

Let me below list the different types of editions I have so far encoded and see if I follow you correctly.

  1. a reviewed/updated/first edition by myself based on visual documentation (photos, or a published estampage), autopsy, etc. I am: author, encoder, but not curator?
  2. a reviewed/updated edition/first by Dr. Vijayavenugopal based on visual documentation (photos, or a published estampage), autopsy, etc., but encoded by myself (I encode them based on images and text files, but there are so many mistakes, typos, letters unread but clear on photos, etc. that I spent considerable time reviewing these editions). Dr. Vijayavenugopal is: author? I am: author, curator, encoder?
  3. a digital edition based solely on a printed edition (SII for instance) encoded by myself. The SII editor is: author? I am: author, curator, encoder?
danbalogh commented 4 months ago

Dear @manufrancis , in the scheme I have proposed (which does not have to be implemented - I'll be the happiest if someone comes up with a better one and tells me to what write up for the EGD), your types would be as follows:

  1. You are the author. There is no encoder and no curator. Both encoder and curator are applicable only if the encoding/curation was done by someone other than the author.
  2. It will never be possible to formulate fully objective and fully correct rules for complex situations. Which of the following three alternatives you choose must be determined by your judgement of how the credit is to be shared. A) It seems to me that in this case Vijayavenugopal and you should both be named as authors, because it is a collaborative edition by the two of you, and both of you are Dharma contributors. In this case, you could also be listed as encoder to show that of the two authors, you alone did the encoding. B) If you feel that "collaborative" does not describe the situation correctly, then by the rules I suggest, you alone are the author and Vijayavenugopal is not to be mentioned in the responsibility statement (but may be credited in the prose bibliography where you introduce the present edition). This is analogous to how your re-collation and improvement of someone else's published edition would be treated. In this case there is no separate encoder (or curator). C) As a third alternative, you could list Vijayavenugopal as author and yourself as curator. My rules are not clear on whether an encoder should be separately recorded in this case; by my logic, the curator is by default also the encoder unless a different encoder is recorded, but if accepted, this may need to be made explicit in the rules. Or we could say that you should be listed as encoder to make it explicit that you alone did the encoding.
  3. If your digital edition is a "facsimile" of the one in SII, then the SII editor is the author and you are the encoder. If you have enhanced the SII edition (as per my definition of "enhancement"), then you are the curator and there is no encoder (since the curator is by default also the encoder unless the latter is separately named). If the SII edition came with a facsimile which you collated (even if this did not result in a revision of the text), then you are the author and the SII editor is named only in the bibliography.
manufrancis commented 4 months ago

Thanks for these clarifications, dear Daniel!!!

danbalogh commented 4 months ago

Most welcome. Do you have an opinion or preference for case 2 above? The "rules" are not finished and need to be discussed, so I'm happy to pick this up.

manufrancis commented 4 months ago

For Vijayavenugopal's editions I would opt for 2.A, most of the time, when I consider that my amount of reviewing/correcting/improving is consequent. There are other Vijayavenugopal's editions, which are fine as received. In which case I would opt for 2.C.

danbalogh commented 4 months ago

Thanks, that makes sense. Do you think that if there is an author and a curator, then the latter is by default also the encoder unless the encoder is explicitly named? Or should "the author and the curator" be the encoder by default, so that if the curator alone did the encoding, then this must be recorded explicitly?

arlogriffiths commented 4 months ago

Thanks, but I think before making any global automated changes, we/you should wait until Arlo, and perhaps another PI or two, have had the opportunity to think about this and contribute. Especially when it affects things that are governed by guidelines other than the EGD, as I don't know the kind of responsibilities relevant to critical editions.

Sorry, I never had the chance to reply. We are not so far really trying to take into account the EGC model, so I am not sure we need to in the present discussion either. Anyhow, I am optimistic that whatever we decide in this discussion will be fairly easy to implement also in a future revision of EGC.

arlogriffiths commented 4 months ago

Ah, now I see I did respond on that issue on 22 Dec. ...

manufrancis commented 4 months ago

I guess there are many different cases.

By "curator" do you mean also the "reviewer" (e.g. X is author and encoder, then Y review the edition for DHARMA)?

In my view, at the risk to have a complex respStat, we should have all responsibilities clearly indicated (even in the case a single person is involved)

  1. Editor (of printed edition)
  2. Editor (of digital edition)
  3. Encoder (of digital edition)
  4. Reviewer/Curator (of digital edition)

As for display:

Editor(s): AB (printed edition), CD (digital edition) Encoder(s): CD Reviewer(s): EF

danbalogh commented 4 months ago

No, "revision" is listed under "anticipated future roles" in a part of my recommendation that is just a sketch. Indeed there are many different cases and what we should be aiming for is a comprehensive framework that will cover all of those tolerably well, without the need to create specific rules for every possible scenario (which nobody would have the time to create, and which nobody would read and memorise).

manufrancis commented 4 months ago

OK. In case 2.C above As a third alternative, you could list Vijayavenugopal as author and yourself as curator. I see myself as a kind of reviewer. I understand now that "curator" covers such a responsibility in this case. Thus fine for me, if I understood correctly.

danbalogh commented 4 months ago

The way I see it, the work of a "curator" is done prior to the creation of a DHARMA digital edition and uses as raw material a non-digital or non-DHARMA edition. Conversely, "reviewing" as I would define it is done at a later stage, taking an already existing DHARMA digital edition as raw material.

manufrancis commented 4 months ago

Hmm ...

You defined 2.C above as follows: C) As a third alternative, you could list Vijayavenugopal as author and yourself as curator. My rules are not clear on whether an encoder should be separately recorded in this case; by my logic, the curator is by default also the encoder unless a different encoder is recorded, but if accepted, this may need to be made explicit in the rules. Or we could say that you should be listed as encoder to make it explicit that you alone did the encoding.

But in that case considering the way you see that the work of a "curator" is done prior to the creation of a DHARMA digital edition and uses as raw material a non-digital or non-DHARMA edition. Conversely, "reviewing" as I would define it is done at a later stage, taking an already existing DHARMA digital edition as raw material.

it does not seem to fit the fact that my curator's work was not done "prior to the creation of a DHARMA digital edition" but in the process of doing it (but whatever) and that I used as raw material either a printed edition by VVG (revised in the form of digital text file by VVG) or a digital text file by VVG (but whatever) ...

In my view, at the risk of complexity, I think we should have slots for all kind of responsibilities in the respStat and that all should be filled in even if the same person has the same role (I might be interested to know how many inscriptions I authored, how many I encoded, how many I curated).

Thus:

Further details could be made explicit if necessary in the epigraphical lemma.

NB: link to respStat is also the copyright stated in publicationStmt which I suppose would be owned all persons mentioned in respStat

danbalogh commented 4 months ago

Thanks for engaging with this.

To be frank, I do not see the first part of your comment as productive; you seem to be finding fault with details of my wording instead of trying to get my point. By "prior to the creation of a DHARMA digital edition" I meant "not posterior to the creation". Normally, curation would of course take place in the process of a Dharma edition's creation. My purpose in saying "prior to" was to distinguish it from revision, which I would define as working to improve (or just check) an already existing Dharma edition, i.e. a posterior thing. I do not understand at all what fine distinction you are picking up (and then dismissing by "but whatever") when you say that your use of A) a printed edition by VVG (revised in the form of digital text file by VVG) or B) a digital text file by VVG does not fit my words "uses as raw material a non-digital or non-DHARMA edition". Neither A, nor B is a digital edition, and neither A nor B is a DHARMA edition.

The second part of the comment, on the other hand, is just the kind of thing that I've been looking forward to. Almost anything is acceptable to me that the PIs can agree on. I have pasted your comment into the Leftovers doc. I think further discussion should take place there, where my current proposal is written out in detail and all the previous discussion is there.

The copyright issue is I'm afraid more complicated than that; I am far from sure that someone who did nothing but encode a previous edition should share the copyright, and Arlo and I have long agreed that if a previous edition of an inscription exists, and you encoded it while collating with visual material, then copyright is yours alone, not shared with the previous editor. Please see the previous discussion in the Leftovers. Again, I'm not saying that it has to be the way I say, only that it has to be some way.

manufrancis commented 4 months ago

Glad that at least a part of my comment seems to be useful ;-) I put this on the list of tasks ahead.

manufrancis commented 3 months ago

To bounce about https://github.com/erc-dharma/project-documentation/issues/299# and looking again at the EGD Leftovers doc, under Authorship and responsibility I suscribe in full agreement, like Arlo above, to Dan's "Latest proposal (October 2023)" and the trias "Author / Curator / Encoder".

So @michaelnmmeyer please implement this as soon as find time to do so.

danbalogh commented 3 months ago

@manufrancis, thanks, I appreciate that, but that outline is not finished and not ready to be implemented. In particular, the bits in blue are incomplete and undecided, and definitely need input from you and @arlogriffiths before we can do anything about them. There's also the issue of differentiating more responsibilities (redundantly as the case may be), as suggested by you above. I'm not against that; we need to discuss.

manufrancis commented 3 months ago

@danbalogh OK. Fine. I will look more closely at the leftovers on these issues in the coming days/weeks.

@michaelnmmeyer Please wait until we fix the remaining issues while you finish writing your PhD.

manufrancis commented 3 months ago

@danbalogh @arlogriffiths

I have read carefully the leftovers on these issues and added my comments

danbalogh commented 3 months ago

Thank you!

manufrancis commented 3 months ago

Thanks to you, Daniel, in the first place.

You have seen that I have mostly hesitation about resp for commentary div. As I see it, whatever we decide, this has no implication on the content of <respStat>. So, when we will have agreed on the the content of <respStat>, we could maybe already move ahead <respStat>. I will now react to your further comments.