biblicalhumanities / Nestle1904

Greek New Testament, edited by Eberhard Nestle, published in 1904 by the British and Foreign Bible Society. Transcription by Diego Santos, morphology by Ulrik Sandborg-Petersen, markup by Jonathan Robie.
18 stars 8 forks source link

Lemmatization Discrepancies #13

Open jcuenod opened 4 years ago

jcuenod commented 4 years ago

I noticed that in a number of places there appears to a mismatch between the lemma and the word in the text. I think this probably arises as a result of a text-critical difference:

Rev 8:6 has αὑτοὺς but the lemma ἑαυτοῦ (sblgnt actually has these same issues). SBL TC note:

αὑτοὺς WH ] αὐτοὺς Treg NIV; ἑαυτοὺς RP

That said, this is not always the case (e.g. Mark 9:16) but a bunch that I checked has this issue.

I suspect that you could find these by looking for the strongs number 848. I came across this by comparing dodson strongs numbers to 1904's.

jonathanrobie commented 4 years ago

I have fixed this for Nestle 1904. SBLGNT does have these same issues ... in spades, I think ... but without Strong's numbers. I will open a separate issue for that.

@jcuenod @emg @rkjtan OK to close this for Nestle 1904?

jtauber commented 4 years ago

Why lemmatize the token with rough breathing to one with smooth breathing? Lexicons usually treat them as separate lexemes.

It is better to lemmatise αὑτούς, etc as ἑαυτοῦ than αὐτός

jtauber commented 4 years ago

Also see https://github.com/morphgnt/sblgnt/issues/22 (these were all fixed in 2015)

jonathanrobie commented 4 years ago

I would like to live in a world where (1) I do not maintain this morphology, (2) SBLGNT and Nestle1904 morphologies are maintained in sync, and (3) morphology is automatically merged into the trees.

jonathanrobie commented 4 years ago

It is better to lemmatise αὑτούς, etc as ἑαυτοῦ than αὐτός

That's what we had to start with. I can revert the changes I made if we can come to agreement on this issue. Can you provide a little more justification?

@jcuenod @emg @rjktan let's see if y'all and James can agree on what should happen here.

jtauber commented 4 years ago

αὑτούς (notice the rough breathing) is not the same word as αὐτούς. The former is an Attic contraction of ἑαυτούς and is hence normally lemmatized as ἑαυτοῦ.

jonathanrobie commented 4 years ago

αὑτούς (notice the rough breathing) is not the same word as αὐτούς.

That's the part I knew.

The former is an Attic contraction of ἑαυτούς and is hence normally lemmatized as ἑαυτοῦ.

That's the part I didn't know. And it also makes it visually easier to distinguish. But both @jcuenod and I tripped over this one, I'm guessing it will surprise others as well. Still, it looks like reverting this is the right thing to do?

rkjtan commented 4 years ago

jcuenod rightly notes that there is a text critical issue in some cases, like αὑτοὺς WH ] αὐτοὺς Treg NIV; ἑαυτοὺς RP. However, I agree with James that when it is the rough breathing, it is normally lemmatized as ἑαυτοῦ.

jcuenod commented 4 years ago

Ahh, interesting, that makes sense @jtauber, thanks.

jonathanrobie commented 4 years ago

So I am not sure if I should (1) simply revert all changes for this issue, or (2) revert all changes, then make a few different changes. If I do the revert, can @rkjtan and @jtauber make any additional changes that are needed?

On Thu, May 14, 2020 at 10:38 AM James Cuénod notifications@github.com wrote:

Ahh, interesting, that makes sense @jtauber https://github.com/jtauber, thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biblicalhumanities/Nestle1904/issues/13#issuecomment-628676563, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPTPKXI44GP7I7M63TUCDRRP6QXANCNFSM4M7BOFAQ .

emg commented 4 years ago

I agree with @jtauber and @rkjtan : αὑτούς needs to be lemmatized as ἑαυτοῦ.

The Nestle 1904 morphology originated with some work that I did – I forget who asked me to do it. You say, @jonathanrobie, that you would like to not maintain the morphology. I might be able to do it, though I would need to ensure that my other commitments can support this simultaneously.

Can we agree for now that I can go in and fix things like this, @jonathanrobie? If so, how do you want me to do it in practical terms? With a pull request or just via plain commits?

emg commented 4 years ago

As for maintaining the Nestle 1904 morphology and SBLGNT in sync, how do we ensure this, @jtauber? Would you like to have a yarn with me on Skype or email?

jonathanrobie commented 4 years ago

I agree with @jtauber and @rkjtan : αὑτούς needs to be lemmatized as ἑαυτοῦ.

The Nestle 1904 morphology originated with some work that I did – I forget who asked me to do it. You say, @jonathanrobie, that you would like to not maintain the morphology. I might be able to do it, though I would need to ensure that my other commitments can support this simultaneously.

Can we agree for now that I can go in and fix things like this, @jonathanrobie? If so, how do you want me to do it in practical terms? With a pull request or just via plain commits?

Yes, you absolutely can. We should discuss whether you fix the trees at the same time so they stay in sync. I'd suggest pull requests if you anticipate controversy or a need for coordination and mutual understanding, plain commits for bug fixes.