CopticScriptorium / misc-development

Miscellaneous issues and items under development. Park your stuff here if you don't know what else to do with it.
0 stars 0 forks source link

more tokenizer issues #1

Closed ctschroeder closed 9 years ago

ctschroeder commented 10 years ago

Here are the tokenizer issues Beth Platte documented this summer, in addition to the fi and theta issues already mentioned.

General impressions clauses with conversions, especially relative conversions, tended not to be tokenized and sometimes were not bound

Not tokenized AP 22

ⲛ̄ⲧⲕ̄ⲟⲩⲕⲱⲥⲙⲓⲕⲟⲛ ⲉϣⲱⲡⲉ ⲙ̄ⲡⲉⲓⲙⲁ

AP 26

ϫⲉ(ⲁϥϫⲟⲟⲥ ⲛ̄ⲉⲛⲥⲩⲛⲕⲗⲏⲧⲓⲕⲟⲥ ⲉⲁϥⲡⲟⲧⲁⲥⲥⲉ ϩⲛⲕⲟⲩⲓ ⲙⲁⲩⲁⲁϥ ϩⲛⲟⲩⲙⲛⲧⲁⲡⲟⲧⲁⲕⲧⲓⲕⲟⲥ ⲛⲧⲉⲡⲉⲑⲃⲃⲓⲟ ϩⲛⲛⲉⲧⲟⲩⲁⲁⲃ ⲧⲙⲛⲧⲥⲩⲛⲕⲗⲏⲧⲓⲕⲟⲥ ⲁⲕⲥⲟⲣⲙⲉⲥ ⲙⲛⲧⲙⲟⲛⲁⲭⲟⲥ

1 Cor 1 1:2 ⲛⲧ|ⲁ|ⲩ| should be genitive preposition ⲛⲧⲁ|ⲩ 1:3 ⲛⲏⲧⲛ not tokenized 1:4 ⲉⲛⲧⲁⲩⲧⲁⲁⲥ not tokenized 1:5 ⲁ|ⲧⲉⲧⲛ|ⲣⲣⲙⲙⲁⲟ ⲣ and ⲣⲙⲙⲁⲟ not tokenized 1:6 ⲉⲛⲧⲁ ⲧ|ⲙⲛⲧⲙⲛⲧⲣⲉ ⲉⲛⲧⲁ treated as a separate word, not tokenized; ⲙⲛⲧ and ⲙⲛⲧⲣⲉ not tokenized 1:7 ⲉⲧⲙⲧⲣⲉⲧⲛϣⲱⲱⲧ not tokenized 1:7 ⲉⲧⲉⲧⲛϭⲱϣⲧ not tokenized 1:7 ⲙⲡϭⲱⲗⲡ not tokenized 1:8 ⲉⲧ|ⲛⲁ|ⲧⲁϫⲣⲉ|ⲧⲏⲩⲧⲛ ⲧⲏⲩⲧⲛ not bound 1:8 ⲉⲧⲉⲧⲛϫⲏⲕ not tokenized 1:9 ⲉⲛⲧ|ⲁ|ϥ|ⲧⲉϩⲙ|ⲧⲏⲩⲧⲛ not tokenized; ⲧⲏⲩⲧⲛ not bound 1:10 ⲙⲡ|ⲓ|ϣⲁϫⲉ should be tokenized ⲙ|ⲡⲓ|ϣⲁϫⲉ 1:10 ⲛⲧⲉⲧⲙⲡⲱⲣϫ not tokenized 1:10 ⲛⲧⲉⲧⲛϣⲱⲡⲉ should be bound 1:10 ⲉ|ⲧⲉⲧⲛ|ⲥⲃⲧⲱⲧ not tokenized 1:10 ⲙⲛϯⲅⲛⲱⲙⲏ not tokenized 1:11 ⲉⲧⲃⲉ|ⲧⲏⲩⲧⲛ not bound 1:11 ϩⲓ|ⲧⲟⲟⲩⲧⲟⲩ not bound; also should be ϩⲓ|ⲧⲟⲟⲧⲟⲩ 1:11 ⲛⲛⲁⲭⲗⲟⲏ not tokenized 1:12 ⲙⲡⲁⲓ not tokenized 1:12 ⲁⲛⲅ|ⲡⲁ|ⲡⲁⲩⲗⲟⲥ not bound 1:12 ⲁⲛⲅ|ⲡⲁ|ⲁⲡⲟⲗⲗⲱ not bound 1:12 ⲁⲛⲅ|ⲡⲁ|ⲕⲏⲫⲁ not bound 1:12 ⲁⲛⲅ|ⲡⲁ|ⲡⲉ|ⲭⲣⲓⲥⲧⲟⲥ not bound 1:13 ⲏⲛⲧⲁⲧⲉⲧⲛϫⲓ ⲃⲁⲡⲧⲓⲥⲙⲁ ⲏ should not be bound; ⲏⲛⲧⲁⲧⲉⲧⲛϫⲓ not tokenized; ⲃⲁⲡⲧⲓⲥⲙⲁ should be bound 1:15 ⲉⲛⲛⲉ|ⲟⲩⲁ not bound 1:16 ⲙⲡⲕⲉⲏⲉⲓ not tokenized 1:16 ⲛⲥⲧⲉⲫⲁⲛⲁ not tokenized 1:16 ⲛϭⲉ not tokenized 1:17 ⲛⲧ|ⲁ|ⲡⲉ|ⲭⲣⲓⲥⲧⲟⲥ ⲛⲧⲁ not bound or tokenized 1:17 ⲧⲛⲛⲟⲟⲩⲧ not tokenized 1:17 ⲉⲉⲩⲁⲅⲅⲉⲗⲓⲍⲉ not tokenized 1:17 ⲉⲛⲛⲉϥϣⲱⲡⲉ not tokenized 1:18 ⲛⲛⲉ|ⲧ|ⲛ|ⲁϩⲉ not tokenized correctly 1:19 ⲧⲁⲁⲑⲉⲧⲉⲓ not tokenized 1:19 ⲛ|ⲧ|ⲙⲛⲧⲥⲁⲃⲉ not tokenized 1:20 ⲙⲡⲉ|ⲡ|ⲛⲟⲩⲧⲉ not bound 1:20 ⲉ|ⲓ|ⲣⲉ not tokenized correctly 1:21 ⲙⲡⲉ|ⲡ|ⲕⲟⲥⲙⲟⲥ not bound 1:21 ⲉⲧ|ⲟⲩ|ϫⲉ not tokenized correctly; ⲛ|ⲉⲧ|ⲡⲓⲥⲧⲉⲩⲉ should be bound to it. 1:21 ⲙⲡⲧⲁϣⲉⲟⲉⲓϣ not tokenized 1:22 ⲛⲉⲧⲟⲩⲁⲓⲧⲓ not tokenized 1:23 ⲉⲛⲧⲁϣⲉⲟⲓϣ not tokenized 1:23 ⲛⲁⲩ not tokenized 1:24 ⲛⲁⲩ not tokenized 1:26 ϫⲉ|ⲙⲛ|ϩⲁϩ not bound (ⲙⲛ as negative nominal with indefinite) 1:26 ⲙⲛ|ϩⲁϩ not bound (twice) 1:27 ⲛⲉⲛⲧⲁ|ⲡ|ⲛⲟⲩⲧⲉ not bound or tokenized (twice) 1:27 ⲉϥⲉϯϣⲓⲡⲉ not tokenized 1:28 ⲛⲉⲧⲥⲟϣϥ not tokenized 1:28 ⲛⲉⲛⲧⲁ|ⲡ|ⲛⲟⲩⲧⲉ not bound or tokenized 1:28 ⲛⲉⲧⲉⲛⲥⲉϣⲟⲟⲡ not tokenized 1:28 ⲛⲛⲉⲧϣⲟⲟⲡ not tokenized 1:29 ϫⲉ|ⲛⲛⲉ|ⲗⲁⲁⲩ not bound 1:29 ⲛ|ⲥ||ⲁⲣⲝ tokenized incorrectly 1:30 ⲛⲧⲉⲧⲛϩⲉⲛⲉⲃⲟⲗ not bound/tokenized 1:30 ⲉⲛⲧⲁϥϣⲱⲡⲉ not tokenized 1:30 ⲛⲥⲱⲧⲉ not tokenized ⲧⲙⲛⲧⲥⲟϭ Note: I’m pretty sure ⲉⲙ|ⲛ ⲛⲟⲃⲉ is a typo (1:8) Note: I’m pretty sure ⲉ|ⲧⲉⲧⲛ|ⲉ|ϫⲱ is a typo for ⲉ|ⲧⲉⲧⲛ|ϫⲱ (1:10) Note: ϩⲓ|ⲧⲟⲟⲩⲧⲟⲩ should be ϩⲓ|ⲧⲟⲟⲧⲟⲩ (1:11)

2:1 ⲟⲩϫⲓⲥⲉ not tokenized ⲏⲛⲥⲟⲫⲓⲁ ⲏ should not be bound; not tokenized ⲛ|ⲧ|ⲙⲛⲧⲙⲛⲧⲣⲉ not tokenized

2:2 ⲛϩⲏⲧ ⲧⲏⲩⲧⲛ not bound

2:3 ⲟⲩ|ⲙⲛⲧϭⲱⲃ not tokenized ⲟⲩ|ⲥ|ⲧⲱⲧ not tokenized correctly ⲉⲛⲁϣⲱϥ not tokenized

2:4 ⲡⲁⲧⲁϣⲉⲟⲉⲓϣ not tokenized ⲟⲩⲡⲓⲑⲉ not tokenized ⲟⲩⲟⲩⲱⲛϩ not tokenized ⲙⲡⲛⲉⲩⲙⲁ not tokenized

2:5 ⲉⲛⲛⲉⲧⲛⲡⲓⲥⲧⲓⲥ not tokenized ⲟⲩⲡⲓ ⲧ|ϩⲉ I’m pretty sure that this is meant to be ⲟⲩ|ⲡⲓⲑⲉ, as in the verse above. I’ve made it one word and fixed the pipes, but we’ll have to review this.

2:6 ⲉⲛⲧⲁⲡⲉⲓ|ⲁⲓⲱⲛ not bound or tokenized. I’m pretty sure this should be ⲉ|ⲛⲧⲁ|ⲡⲉⲓ|ⲁⲓⲱⲛ, a circumstantial (as relative with ⲟⲩⲥⲟⲫⲓⲁ, since it’s indefinite) with ⲛⲧⲁ as the genitive preposition. However, ⲛⲧⲁ is the prepersonal form, not the prenominal. Likewise with ⲛⲧⲁ|ⲛ|ⲁⲣⲭⲱⲛ below. I’ve tokenized it this way, but we might have to review this.

2:7 ⲧⲉⲛⲧⲁ ⲡ|ⲛⲟⲩⲧⲉ not bound or tokenized

2:8 ⲉⲧⲉⲙⲡⲉ ⲗⲁⲁⲩ not bound or tokenized ⲉⲛⲉⲛⲧⲁⲩⲥⲟⲩⲱⲛⲥ not tokenized

2:9 ϫⲉ ⲛⲉⲧⲉ ⲙⲡⲉ ⲃⲁⲗ not bound or tokenized ⲛⲉⲧⲉ ⲙⲡⲉ ⲙⲁⲁϫⲉ not bound or tokenized ⲛⲉⲧⲉ ⲙⲡ|ⲟⲩ|ⲁⲗⲉ not bound or tokenized ⲉⲛⲧⲁ ⲡ|ⲛⲟⲩⲧⲉ not bound or tokenized ⲛⲛⲉ|ⲧ|ⲙⲉ| not tokenized

2:10 ⲛ|ⲛ|ⲕⲁ tokenized incorrectly ⲛⲉⲑⲏⲡ not tokenized; should be ⲛ|ⲉⲧ|ϩⲏⲡ, but I can’t remember what we do with ⲑ

2:11 ⲛⲛⲁ ⲡ|ⲣⲱⲙⲉ not bound and tokenized ⲉⲧⲛϩⲏⲧϥ not tokenized ⲛⲛⲁ ⲡ|ⲛⲟⲩⲧⲉ not bound and tokenized ⲙⲡⲉ ⲗⲁⲁⲩ not bound and tokenized

2:12 ⲙⲡⲉⲉⲓⲕⲟⲥⲙⲟⲥ not tokenized; should be ⲙ|ⲡⲉⲓ|ⲕⲟⲥⲙⲟⲥ (not changed in text yet) ⲡⲉⲃⲟⲗ not tokenized ⲉⲛⲉⲛⲧⲁ ⲡ|ⲛⲟⲩⲧⲉ not bound and tokenized

2:13 ⲉⲧⲉ ⲛⲁⲓ not bound ⲛⲉⲧⲛ|ϣⲁϫⲉ not tokenized ⲛϯⲥⲃⲱ not tokenized ϩⲉⲛϯⲥⲃⲱ not tokenized ⲙⲡⲛⲉⲩⲙⲁ not tokenized ⲉⲛϣⲱⲛⲃ not tokenized

2:14 ⲟⲩⲯⲩⲭⲓⲕⲟⲥ not tokenized ⲛⲛⲁ ⲡⲉ|ⲡⲛⲉⲩⲙⲁ not bound and tokenized ⲥⲉⲁⲛⲁⲕⲣⲓⲛⲉ not tokenized

2:15 ϣⲁϥⲁⲛⲁⲕⲣⲓⲛⲉ not tokenized ⲙⲉⲣⲉ ⲗⲁⲁⲩ not bound

2:16 ⲉⲧⲛⲁⲧⲥⲉⲃⲉⲉⲓⲁⲧϥ not tokenized

3:1 ⲙⲡ|ⲓ|ϣϭⲙϭⲟⲙ not fully tokenized

3:2 ⲁ|ⲓ|ⲧⲥⲉ|ⲧⲛ ⲉⲣⲱⲧⲉ should be bound (I’m pretty sure, based on Layton 172) ⲛⲉ ⲙⲡⲁⲧⲉⲧⲛⲉϣϭ ⲙϭⲟⲙ not bound; not tokenized; separated incorrectly ⲉⲙⲡⲁⲧⲉⲧⲛⲉϣϭⲙϭⲟⲙ not tokenized

3:3 ⲉⲧ|ⲉⲓ tokenized incorrectly (should be the Greek ετι) ⲛⲧⲉⲧⲛ ϩⲉⲛ|ⲥⲁⲣⲕⲓⲕⲟⲥ should be bound (twice in verse) ϩⲟⲡ|ⲟⲩ tokenized incorrectly (should be Greek ὅπου) ⲟⲩ|ⲛ|ⲕⲱϩ tokenized incorrectly ϯⲧⲱⲛ not tokenized ⲉⲧⲉⲧⲛⲙⲟⲟϣⲉ not tokenized

3:4 ⲉⲣϣⲁⲛ ⲟⲩⲁ should be bound ⲁⲛⲅ ⲡⲁ ⲡⲁⲩⲗⲟⲥ should be bound ⲁⲛⲅ ⲡⲁ ⲁⲡⲟⲗⲗⲱ should be bound ⲛⲧⲉⲧⲛ ϩⲉⲛ|ⲣⲱⲙⲉ should be bound

3:5 ⲉⲛⲧⲁ ⲡ|ϫⲟⲉⲓⲥ should be bound ϯ|ⲛⲁ||ϥ separated and tokenized incorrectly

3:7 ⲙⲡⲁ ⲡ|ⲉⲧ|ⲧⲱϭⲉ Should be bound (I think. I’m not entirely sure of the grammar here) ⲙⲡⲁ ⲡ|ⲉⲧ|ⲧⲥⲟ Same as above

3:8 ⲡ|ⲟⲩⲁ ⲛⲁ|ϫⲓ should be bound

3:9 ⲁⲛⲟⲛ ϩⲉⲛ|ϣⲃⲣⲣϩⲱⲃ should be bound and tokenized ⲛⲧⲉⲧⲛ ⲟⲩⲕⲱⲧ should be bound and tokenized

3:10 ⲉⲛⲧⲁⲩⲧⲁⲁϥ not tokenized ⲛⲁⲓ not tokenized ⲛⲁⲣⲭⲓⲧⲉⲕⲧⲱⲛ not tokenized ⲟⲩⲛϭⲉ not tokenized ⲛⲁ|ϣ tokenized incorrectly

3:11 ⲉⲕⲁⲕⲉⲥⲛⲧⲉ not tokenized ⲡⲁⲣⲁⲧⲉⲧⲕⲏ not tokenized ⲉⲧⲉ ⲡⲁⲓ should be bound

3:12 ⲉϣϫⲉ ⲟⲩⲛ ⲟⲩⲁ should be bound ϩⲉⲛⲉⲛⲉⲙⲙⲉ not tokenized (I broke up ⲉⲛⲉⲙⲙⲉ, but I’m not sure) ⲟⲩ|ⲣⲟ|ⲟⲩⲉ tokenized incorrectly

3:13 ⲛⲁ|ⲟⲩⲱⲛϩ tokenized incorrectly ⲛⲁ|ⲟⲩⲟⲛϩ|ϥ tokenized incorrectly ⲉⲧ|ϥⲟ not tokenized

3:14 ⲡ|ⲉⲧⲉⲣⲉ ⲡⲉϥ|ϩⲱⲃ should be bound ⲉⲛⲧⲁϥⲕⲟⲧϥ not tokenized

3:15 ⲡ|ⲉⲧⲉⲣⲉ ⲡⲉϥ|ϩⲱⲃ should be bound ϥⲛⲁϯⲟⲥⲉ not tokenized ⲛⲧⲉⲓϩⲉ not tokenized

3:16 ⲛⲧⲉⲧⲛ ⲥⲟⲟⲩⲛ should be bound ⲛⲧⲉⲧⲛ ⲡ|ⲣⲡⲉ should be bound

3:17 ⲛⲧⲉⲧⲛ ⲡ|ⲣⲡⲉ should be bound ⲉ|ⲧⲉⲛ|ⲧⲱⲧ|ⲛ tokenized incorrectly

3:18 ⲙⲡⲣⲧⲣⲉ ⲗⲁⲁⲩ should be bound and tokenized ⲁⲛⲅ ⲟⲩ|ⲥⲟⲫⲟⲥ should be bound

3:19 ⲛⲉⲩⲕⲟⲧⲥ not tokenized

3:20 ⲛⲙⲙⲟⲕⲙⲉⲕ not tokenized

3:21 ⲙⲡⲣⲧⲣⲉ ⲗⲁⲁⲩ should be bound

3:22 ⲡ|ⲧⲏⲣϥ tokenized incorrectly (unless you want the noun tokenized differently) (twice in verse) ⲡⲱⲧ|ⲛ tokenized incorrectly

3:23 ⲛⲧⲉⲧⲛ ⲛⲁ ⲡⲉ|ⲭⲣⲓⲥⲧⲟⲥ should be bound ⲡⲁ ⲡ|ⲛⲟⲩⲧⲉ should be bound

4:1 ⲙⲁⲣⲉ ⲛ|ⲣⲱⲙⲉ should be bound ⲟⲡⲛ not tokenized

4:2 ⲙⲡⲉⲓⲙⲁ not tokenized ⲉⲩⲡⲓⲥⲧⲟⲥ not tokenized

4:3 ⲛⲁⲓ not tokenized ⲉⲧⲣⲉⲩⲁⲛⲁⲕⲣⲓⲛⲉ not tokenized ⲏϩⲓⲧⲛ should not be bound ⲛϯⲁⲛⲁⲕⲣⲓⲛⲉ not tokenized

4:4 ⲉ|ⲁ|ⲓ|ⲁⲁϥ not tokenized correctly ⲛⲉ|ⲓ|ⲧⲙⲁⲓⲏⲩ not tokenized correctly? (seems to make more sense as negative ⲛ- plus focalizing converter than preterit converter) ⲡⲉⲧⲁⲛⲁⲕⲣⲓⲛⲉ not tokenized

4:5 ϣⲁⲛⲧⲉ ⲡ|ϫⲟⲉⲓⲥ should be bound ⲉ|ⲓ| not tokenized correctly ⲉⲧ|ⲛⲁ|ⲣⲟⲩⲟⲉⲓⲛ not tokenized correctly ⲉⲛⲉⲑⲏⲡ not tokenized ⲡⲧⲁⲉⲓⲟ not tokenized

4:6 ⲉⲧⲉⲧⲛⲉⲥⲃⲟ not tokenized ⲉ|ⲧⲙ|ⲣϩⲟⲩⲟ not tokenized ⲉⲛⲉⲧⲥⲏϩ not tokenized ⲛⲛⲉ ⲟⲩⲁ should be bound

4:7 ⲉⲙⲡⲕϫⲓⲧϥ not tokenized ⲉϣϫⲉ ⲁⲕⲣⲡⲕⲉϫⲓ should be bound and tokenized ⲉⲙⲡⲕϫⲓ not tokenized

4:8 ⲁ|ⲧⲉⲧⲛ|ⲣⲣⲙⲙⲁⲟ not fully tokenized ⲁϫⲛⲧ|ⲛ not tokenized correctly ⲛⲁ|ⲛⲟⲩⲥ not tokenized correctly ⲉϣϫⲉ ⲁ|ⲧⲉⲧⲛ|ⲣ|ⲣⲣⲟ should be bound ⲉⲛⲉⲣⲣⲣⲟ not tokenized ϩⲱⲱⲛ not tokenized

4:9 ⲛⲧⲁ ⲡ|ⲛⲟⲩⲧⲉ should be bound ⲛϩⲁⲉ not tokenized ⲛⲛⲓⲉⲡⲓⲑⲁⲛⲁⲧⲏⲥ not tokenized

4:10 ⲁ|ⲛ|ⲣⲥⲟϭ not fully tokenized ⲛⲧⲉⲧⲛ ϩⲉⲛ|ⲥⲁⲃⲉ should be bound ⲧⲛϭⲟⲟⲃ not tokenized

4:11 ⲉⲧⲉⲉⲓⲟⲩⲛⲟⲩ not tokenized ⲧ|ⲛⲟⲃⲉ not tokenized correctly ⲧⲛ|ⲕⲏⲕⲁϩⲏⲩ not fully tokenized ⲥⲉϯⲕⲗⲯ not tokenized

4:12 ⲉ|ⲛ|ⲣϩⲱⲃ not fully tokenized ⲙⲙⲟⲛ not tokenized

4:13 ⲛⲛⲓⲡⲉⲣⲓⲕⲁⲑⲁⲣⲙⲁ not tokenized ϣⲁ ϩⲣⲁⲓ not bound

4:14 ⲛⲛⲉⲉⲓϯϣⲓⲡⲉ not tokenized ⲛⲏⲧⲛ not tokenized (x 3) ⲛⲛⲁⲓ not tokenized

4:15 ⲟⲩⲛⲧⲏⲧⲛ not tokenized ⲙⲡⲁⲓⲇⲁⲅⲱⲅⲟⲥ not tokenized ⲁ|ⲓ|ϫⲡⲉ ⲧⲏⲩⲧⲛ should be bound

4:16 ⲧⲛⲧⲛⲧⲏ ⲩⲧⲛ shouldn’t be divided there; not tokenized

4:17 ⲁⲓⲧⲛⲛⲉⲩⲧⲓⲙⲟⲑⲉⲟⲥ not tokenized ⲛⲏⲧⲛ not tokenized ⲉⲧⲉ ⲡⲁⲓ should be bound ⲙⲡ|ⲓ|ⲥⲧⲟ|ⲥ not tokenized correctly ⲉⲧⲛⲁⲧⲣⲉⲧⲛⲣⲡⲙⲉⲉⲩⲉ not tokenized ⲉⲧϩⲙ not tokenized ⲉϯϯⲥⲃⲱ not tokenized

4:18 ⲁϩⲟⲓⲛⲉ not tokenized

4:19 ⲟⲩϭⲉⲡⲏ not tokenized ⲉⲣϣⲁⲛ ⲡ|ϫⲟⲉⲓⲥ should be bound ⲣϩⲛⲁ|ϥ not fully tokenized ⲛⲧⲁⲉⲓⲙⲉ not tokenized ⲛⲛⲉⲧϫⲟⲥⲉ not tokenized

4:20 ⲧ|ⲙⲛⲧⲣⲣⲟ not tokenized ⲛⲉⲥϩⲛϣⲁϫⲉ not tokenized

4:21 ⲡⲉⲧⲉⲧⲛⲟⲩⲁϣϥ not tokenized ⲧⲁⲉⲓ not tokenized ⲙ|ⲙⲛⲧⲣⲙⲣⲁϣ not tokenized

5:1 ⲉⲩⲡⲟⲣⲛⲉⲓⲁ not tokenized ⲛⲧⲉⲉⲓⲙⲓⲛⲉ not tokenized ⲉⲛⲥϩⲛ ⲛ|ⲕⲉ|ϩⲉⲑⲛⲟⲥ should be bound; not tokenized ⲉⲧ|ⲣⲉ ⲟⲩⲁ should be bound; tokenized incorrectly ϫⲓⲑⲓⲙⲉ not tokenized

5:2 ⲧⲉⲧⲛ|ⲙⲏⲧⲉ not tokenized correctly ⲙⲡⲉⲛⲧⲁϥⲉⲓⲣⲉ not tokenized ⲙⲡⲉⲉⲓϩⲱⲃ not tokenized

5:3 ⲉⲛϯϩⲁⲧⲉⲧⲏⲩⲧⲛ not tokenized ⲉⲓϩⲁⲧⲉⲧⲏⲩⲧⲛ not tokenized ⲉⲓϩⲁⲧⲛⲧⲏⲩⲧⲛ not tokenized ⲙⲡⲉⲛⲧⲁϥⲉⲓⲣⲉ not tokenized ⲙⲡⲉⲉⲓϩⲱⲃ not tokenized ⲛⲧⲉⲓϩⲉ not tokenized

5:4 ⲉ|ⲛⲉⲧⲛ|ⲉⲣⲏⲩ not completely tokenized

5:5 ⲙⲡⲁⲓ not tokenized ⲛⲧⲉⲉⲓⲙⲓⲛⲉ not tokenized ⲙⲡⲥⲁⲧⲁⲛⲁⲥ not tokenized ⲉⲣⲉ ⲡⲉ|ⲡⲛⲉⲩⲙⲁ should be bound

5:6 ⲛⲛⲁⲛⲟⲩ ⲡⲉⲧⲛ|ϣⲟⲩϣⲟⲩ should be bound; not fully tokenized ⲛⲧⲉⲧⲛ ⲥⲟⲟⲩⲛ should be bound; not tokenized ϣⲁⲣⲉ ⲟⲩ|ⲕⲟⲩⲓ should be bound ⲧⲣⲉ ⲡⲟⲩⲱϣⲙ should be bound; not tokenized ⲧⲏⲣϥ not tokenized ϥ|ⲓ not tokenized correctly

5:7 ϥ|ⲓ tokenized incorrectly ⲙⲡ|ⲓ||ⲑⲁⲃ tokenized incorrectly ⲛⲁ|ⲥ tokenized incorrectly ⲛⲟⲩⲟⲩⲱϣⲙ not tokenized ⲉⲛⲧⲉⲧⲛϩⲉⲛⲁⲑⲁⲃ not tokenized

5:8 ⲙⲁⲣⲛⲣϣⲁ not tokenized ⲛⲁ|ⲥ tokenized incorrectly ϩⲉⲛⲁⲑⲁⲃ not tokenized ϩⲓⲙⲉ not tokenized

5:9 ⲛⲏⲧⲛ not tokenized

5:10 ⲙⲡⲟⲣⲛⲟⲥ not tokenized ⲏⲙⲙⲁⲓⲧⲟ not tokenized ⲏⲛⲣⲉϥⲧⲱⲣⲡ should be divided; not tokenized ⲏⲛⲣⲉϥϣⲙϣⲉ ⲉⲓⲇⲱⲗⲟⲛ should be divided and bound; not tokenized

5:11 ⲛⲏⲧⲛ not tokenized ⲉϣⲱⲡⲉ not tokenized ⲟⲩⲡⲟⲣⲛⲟⲥ not tokenized ⲏⲛⲣⲉϥϣⲙϣⲉ ⲉⲓⲇⲱⲗⲟⲛ should be divided and bound; not tokenized ⲏⲙⲙⲁⲓⲧⲟ should be divided; not tokenized ⲏⲛⲣⲉϥⲥⲁϩⲟⲩ should be divided; not tokenized ⲏⲛⲣⲉϥϯϩⲉ should be divided; not tokenized ⲏⲛⲣⲉϥⲧⲱⲣⲡ should be divided; not tokenized ⲛⲧⲉⲉⲓⲙⲓ ⲛⲉ divided incorrectly; not tokenized

5:12 ⲛⲛⲉⲧϩⲓⲃⲟⲗ not tokenized ⲛⲛⲉⲧϩⲓϩⲟⲩⲛ not tokenized ⲛⲛⲉⲧϩⲓ ϩⲟⲩⲛ should be bound; not tokenized

5:13 ⲛⲛⲉⲧϩⲓⲃⲟⲗ not tokenized ϥ|ⲓ tokenized incorrectly

More 1Cor notes the verb ϥⲓ was always divided into two tokens (ϥ and ⲓ), despite the fact that I didn't tokenize it that way in the .txt file. Also, the ⲑs were not broken into ⲧs and ⲏs. I found an instance in each; I've highlighed the cells which are a problem. In 1 Cor 4, it should be ⲉⲧϩⲏⲡ instead of ⲉ and ⲑⲏⲡ. In 1 Cor 5, ⲑⲓⲙⲉ should be ⲧϩⲓⲙⲉ. There are also two instances where a ⲑ "swallowed" a ⲧ in 1 Cor 5; in both cases, it's ⲁⲑⲁⲃ instead of ⲁⲧⲑⲁⲃ. Because the ⲁⲧ and the ⲑⲁⲃ are different morphs, the cells with the ⲁⲧ are missing the ⲧ, if that makes sense. I've highlighted these too.

ctschroeder commented 10 years ago

This is really long, I know. I put this issue here in the miscellaneous repository because it's private. I wasn't sure if it would be visible in the Tokenizer repository.

If you need me to look into it instead, let me know!

ctschroeder commented 9 years ago

Big chunk of these fixed in September. Will revisit later if necessary.