PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

Not recognising [g]? #605

Closed dvdmrn closed 7 years ago

dvdmrn commented 7 years ago

Whenever I import a .csv with a [g] in the transcription I getKeyError: 'g'. I'm using the ipa2hayes feature matrix. Any guesses as to why that is? As long as there's no g's everything else seems to run fine.

Here's the code I'm running if it means anything

from corpustools.corpus import io
from corpustools.symbolsim import phono_edit_distance
import numpy

#calculates the phonological edit distance between all words in a set

print ("loaded? Loaded.",myCorpus)
phonoSimList = []

myCorpus = io.csv.load_corpus_csv(
    "exampleCorpus", 
    "P1.csv", 
    ",", 
    ".", 
    annotation_types=None, 
    feature_system_path=None, 
    stop_check=None, 
    call_back=None)

io.binary.download_binary("ipa2hayes", "/matrix", call_back=None)

ipa2hayes = io.binary.load_binary("/matrix")
io.binary.save_binary(ipa2hayes, "/matrix")
print("did it load?",ipa2hayes.features)

for word in myCorpus.wordlist:
    for compareWord in myCorpus.wordlist:
        if (word == compareWord):
            print("same word")
        else:
            phonoEditDistance = phono_edit_distance.phono_edit_distance(
                myCorpus.wordlist.get(word),
                myCorpus.wordlist.get(compareWord),
                "transcription",
                io.binary.load_binary("/matrix")
                )
            print("comparing: ",
                myCorpus.wordlist.get(word).transcription,
                " to: ",
                myCorpus.wordlist.get(compareWord).transcription, 
                ": ", 
                phonoEditDistance
                )
            phonoSimList.append(phonoEditDistance)
    print("final phonoSimList for set: ", phonoSimList)

print("mean result: ", numpy.mean(phonoSimList))
kchall commented 7 years ago

Can you try this on the develop branch? I think corpora with [g] do now load, but we may have a problem where [g] isn't showing up in the inventory window (e.g., if you go to Corpus / Summary).

kchall commented 7 years ago

Basically [g]s are working, but they don't show up in the summary window (they are in the manage inventory window). Other unrecognized segments are listed as uncategorized in both. g-text.txt

dvdmrn commented 7 years ago

update: this appears to be a character encoding issue. When importing corpora with mixed types of in the GUI I get the error:

File "/usr/local/lib/python3.5/dist-packages/corpustools-1.2.0-py3.5.egg/corpustools/gui/main.py", line 456, in manageInventoryChart if not self.inventoryModel._data: AttributeError: 'NoneType' object has no attribute '_data'

kchall commented 7 years ago

Hmm, just tried the g-text corpus again, and [g] did show up correctly as uncategorized, which is great. But then tried a corpus containing both a text and and IPA ɡ; the IPA one shows up correctly (and is classified correctly), but the text one is again not present anywhere in the table. g-text_with_two_kinds_of_g.txt

kchall commented 7 years ago

working for the moment??