HughP / tcf-james-bigramed

Bigram Counting for tcf james
0 stars 0 forks source link

Spanish-style punctuation #12

Open iandoug opened 6 years ago

iandoug commented 6 years ago

Does tcf also use Spanish-style matching ¡ ! and ¿ ? ?

HughP commented 6 years ago

No. Not that I can tell. I think they have a question particle like 'ma' in Mandarin Chinese. they do use << and >> for quotes like French.

On Sun, Jul 1, 2018 at 2:44 AM, Ian Douglas notifications@github.com wrote:

Does tcf also use Spanish-style matching ¡ ! and ¿ ? ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/HughP/tcf-james-bigramed/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AACdJk8BByFZ_foGGT3mL8D2iaMLhz5vks5uCJnngaJpZM4U-Q8z .

iandoug commented 6 years ago

The corpus has two cases of inverted exclamation mark:

¡ U+00A1 2 Inverted Exclamation Mark

But I didn't see any inverted question marks, hence the question.

HughP commented 6 years ago

Which file in the corpus?

On Sun, Jul 1, 2018 at 11:49 PM, Ian Douglas notifications@github.com wrote:

The corpus has two cases of inverted exclamation mark:

¡ U+00A1 2 Inverted Exclamation Mark

But I didn't see any inverted question marks, hence the question.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/HughP/tcf-james-bigramed/issues/12#issuecomment-401688329, or mute the thread https://github.com/notifications/unsubscribe-auth/AACdJp3xoHLB6va2LdXjo17uHODiYmsAks5uCcJhgaJpZM4U-Q8z .

iandoug commented 6 years ago

mephaa3-unicode.txt

Line 89: 5 Xúꞌkhue̱n máꞌ jaꞌnii rí ra̱ju̱u̱n xa̱bo̱ mangaa, mbá xuwi lájwíin ñajuun ne̱, jamí phú mba̱a̱ rí na̱ꞌngo̱o̱ ne̱ naꞌne. ¡Ra̱ꞌkhá tháán mba̱a̱ júba̱ ikha eꞌne mbá lájwíin ri̱ꞌyu̱u̱ agu rí nakhati̱yo̱o̱ꞌ!

Line 115: 4 ¡Ra̱ꞌkhá xa̱bo̱ tsí nandúún juyáá i̱ndo̱ó Ana̱ꞌlóꞌ ñajwanlaꞌ! Á tsíya̱álaꞌ rí xa̱bo̱ tsí nandoo guéño jaꞌyoo rí ríga̱ ná numbaa, tsíyoo rí ma̱mbáxu̱u̱ꞌ ga̱jmáa̱ Ana̱ꞌlóꞌ rá dxe̱ꞌ. Ikhaa jngó, asndo tsáa máꞌ tsí nandoo guéño jaꞌyoo rí ríga̱ ná numbaa, tsímbáxu̱u̱ꞌ ga̱jmáa̱ Ana̱ꞌlóꞌ.