PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
111 stars 16 forks source link

Error in reading directory of TextGrids #664

Closed kchall closed 5 years ago

kchall commented 6 years ago

The sample directory of TextGrids is no longer working:

When the file path to Dropbox/Phonological_CorpusTools_Public/example_files/CSJ_sample is selected in the "Import corpus" dialogue box, the following error occurs:

Traceback (most recent call last): File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/decorators.py", line 12, in do_check function(*args,**kwargs) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/iogui.py", line 734, in inspect anno_types = inspect_discourse_textgrid(self.pathWidget.value()) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/textgrid.py", line 106, in inspect_discourse_textgrid tg = load_textgrid(t) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/textgrid.py", line 148, in load_textgrid tg.read(path) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/textgrid.py", line 36, in read self.minTime = round(float(source.readline().split()[2]), 5) AttributeError: 'str' object has no attribute 'readline'

kchall commented 6 years ago

Note: the csj2hayes.txt feature file and the list of CSJ digraphs are in the /Users/kathleenhall/Dropbox/Phonological_CorpusTools_Public/TRANS folder.

kchall commented 6 years ago

Hmm, I just synced to the latest version of PCT on the Master branch, but still have the same error:

Traceback (most recent call last): File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/decorators.py", line 12, in do_check function(*args,**kwargs) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/iogui.py", line 732, in inspect anno_types = inspect_discourse_textgrid(self.pathWidget.value()) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 106, in inspect_discourse_textgrid tg = load_textgrid(t) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 148, in load_textgrid tg.read(path) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 36, in read self.minTime = round(float(source.readline().split()[2]), 5) AttributeError: 'str' object has no attribute 'readline'

jsmackie commented 6 years ago

Does this only happen with an existing corpus? What happens if you try to make a corpus from scratch?

kchall commented 6 years ago

I've never tried with an existing corpus; this is what happened when trying to create the corpus from scratch.

kchall commented 6 years ago

Traceback (most recent call last): File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/iogui.py", line 92, in run corpus = load_directory_textgrid(**self.kwargs) File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 397, in load_directory_textgrid corpus.lexicon.specifier = modernize.modernize_specifier(corpus.lexicon.specifier) File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/modernize.py", line 110, in modernize_specifier features = sorted(list(specifier.matrix[seg].keys())) AttributeError: 'Segment' object has no attribute 'keys'

kchall commented 6 years ago

Just to be clear -- I did re-start and re-test this, and am still getting exactly the same error message as above. @jsmackie

kchall commented 6 years ago

OK, now it says:

Traceback (most recent call last): File "/Users/KCH/Desktop/CorpusTools/corpustools/decorators.py", line 12, in do_check function(*args,**kwargs) File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/iogui.py", line 732, in inspect anno_types = inspect_discourse_textgrid(self.pathWidget.value()) File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 50, in inspect_discourse_textgrid tg = load_textgrid(t) File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 91, in load_textgrid tg = TextGrid.fromFile(path) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/textgrid/textgrid.py", line 772, in fromFile tg.read(f) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/textgrid/textgrid.py", line 672, in read file_type, short = parse_header(source) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/textgrid/textgrid.py", line 570, in parse_header file_type = parse_line(source.readline(), short, '') # header junk File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/textgrid/textgrid.py", line 558, in parse_line return m.groups()[0] AttributeError: 'NoneType' object has no attribute 'groups'

kchall commented 6 years ago

Current error:

Traceback (most recent call last): File "/Users/KCH/Desktop/CorpusTools/corpustools/decorators.py", line 12, in do_check function(*args,**kwargs) File "/Users/KCH/Desktop/CorpusTools/corpustools/gui/iogui.py", line 732, in inspect anno_types = inspect_discourse_textgrid(self.pathWidget.value()) File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 56, in inspect_discourse_textgrid tg = load_textgrid(t) File "/Users/KCH/Desktop/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 97, in load_textgrid tg = TextGrid.fromFile(path) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/textgrid/textgrid.py", line 709, in fromFile tg.read(f) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/textgrid/textgrid.py", line 649, in read itie.addPoint(Point(jtim, jmrk)) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/textgrid/textgrid.py", line 321, in addPoint raise ValueError(point)# we already got one right there ValueError: Point(244.584055, FR)

YuHsiangLo commented 6 years ago

I think I figured out the problem. There are two actually:

(1) The TextGrid file A01M0007.TextGrid in the CSJ sample folder contains one mistake. If you open it with TextEdit and go to line 123225, you'll fine that both point[11] and point[12] have the same time stamp (which I think is a mistake they made when they're annotating the sound file). This is what caused the problem.

(2) The textgrid module has a small feature that is not compatible with the naming convention of PCT. Specifically, textgrid module saves the names of different tiers using the original names specified in the .TextGrid file (so the initial lowercase letter is stored as a lowercase letter), but PCT reads in the data using capitalized names. A quick way to solve this problem without modifying the code is to open each .TextGrid file with Praat and manually set the name of each tier with a capitalized initial letter.

I already uploaded the corrected files onto Dropbox (example_files > CSJ_sample_corrected). The problem should be solved using this folder.

kchall commented 5 years ago

Huh, I am now getting exactly the original error from February again! I just tried with both the WebMaus corpus and the "corrected" CSJ corpus sample, and get the following:

Traceback (most recent call last): File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/decorators.py", line 12, in do_check function(*args,**kwargs) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/iogui.py", line 732, in inspect anno_types = inspect_discourse_textgrid(self.pathWidget.value()) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 116, in inspect_discourse_textgrid tg = load_textgrid(t) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 158, in load_textgrid tg.read(path) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/corpus/io/pct_textgrid.py", line 38, in read self.minTime = round(float(source.readline().split()[2]), round_digits) AttributeError: 'str' object has no attribute 'readline'

@YuHsiangLo any ideas?

(This happens both when I try to read in the whole directory and when I try to read in a single .TextGrid file.)

kchall commented 5 years ago

(Note: I am running TextGrid 1.4 and getting the error; Roger is running TextGrid 1.1 and not getting the error.)

kchall commented 5 years ago

This largely seems to be working, but once the corpus is created, and is first being loaded, PCT crashes with the following error:

Traceback (most recent call last): File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 293, in do_check function(self) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 453, in loadCorpus self.inventoryModel = None if specifier_check is None else self.generateInventoryModel() File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 468, in generateInventoryModel inventoryModel = InventoryModel(self.corpusModel.corpus.inventory, copy_mode=False) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 939, in init self.sortData() File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 1407, in sortData sorted_cons_col_headers = sorted(list(self.consColumns), key=lambda x: self.cons_column_data[x][0]) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 1407, in sorted_cons_col_headers = sorted(list(self.consColumns), key=lambda x: self.cons_column_data[x][0]) KeyError: 'Column 1' Abort trap: 6

kchall commented 5 years ago

^ I just tried this with a non-TextGrid corpus (the running text sample), and the same thing happens: the corpus is created, but crashes PCT on first loading, and then is fine if PCT is re-opened and the corpus re-loaded.

mdfry commented 5 years ago

Make sure documentation matches with the new implementation that textgrid1.1 has been added directly into the corpus/io/ folder

mdfry commented 5 years ago

Updated the docs, now onto that strange crashing

kchall commented 5 years ago

The WebMaus directory seems to be working, but the CSJ directory still crashed on trying to load:

Traceback (most recent call last): File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 1408, in sortData sorted_cons_col_headers = sorted(list(self.consColumns), key=lambda x: self.cons_column_data[x][0]) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 1408, in sorted_cons_col_headers = sorted(list(self.consColumns), key=lambda x: self.cons_column_data[x][0]) KeyError: 'Dental'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 293, in do_check function(self) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 453, in loadCorpus self.inventoryModel = None if specifier_check is None else self.generateInventoryModel() File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 468, in generateInventoryModel inventoryModel = InventoryModel(self.corpusModel.corpus.inventory, copy_mode=False) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 939, in init self.sortData() File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 1412, in sortData except KeyError(): TypeError: catching classes that do not inherit from BaseException is not allowed Abort trap: 6

kchall commented 5 years ago

OK, it seems to be related to doing it twice in a row. If I do the CSJ corpus first, it's fine, but then if I try to do WebMaus immediately after, it crashes again:

{'Column 1'} {'Column 1': [0, {}, None]} {'Column 1'} {'Column 1': [0, {}, None]} {'Column 1', 'Dental', 'Labiodental', 'Velar', 'Labial', 'Alveopalatal'} {'Labial': [0, {'consonantal': '+', 'labial': '+', 'coronal': '-', 'labiodental': '-'}, None], 'Labiodental': [1, {'consonantal': '+', 'labiodental': '+'}, None], 'Dental': [2, {'consonantal': '+', 'anterior': '+', 'coronal': '+', 'labial': '-', 'labiodental': '-'}, None], 'Alveopalatal': [3, {'consonantal': '+', 'anterior': '-', 'coronal': '+', 'labial': '-'}, None], 'Palatal': [4, {'consonantal': '+', 'dorsal': '+', 'coronal': '+', 'labial': '-'}, None], 'Velar': [5, {'consonantal': '+', 'dorsal': '+', 'labial': '-'}, None], 'Uvular': [6, {'consonantal': '+', 'dorsal': '+', 'back': '+', 'labial': '-'}, None], 'Glottal': [7, {'consonantal': '+', 'dorsal': '-', 'coronal': '-', 'labial': '-', 'nasal': '-'}, None]} Traceback (most recent call last): File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 293, in do_check function(self) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 453, in loadCorpus self.inventoryModel = None if specifier_check is None else self.generateInventoryModel() File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/main.py", line 468, in generateInventoryModel inventoryModel = InventoryModel(self.corpusModel.corpus.inventory, copy_mode=False) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 939, in init self.sortData() File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 1409, in sortData sorted_cons_col_headers = sorted(list(self.consColumns), key=lambda x: self.cons_column_data[x][0]) File "/Users/kathleenhall/Desktop/GitHub/CorpusTools/corpustools/gui/models.py", line 1409, in sorted_cons_col_headers = sorted(list(self.consColumns), key=lambda x: self.cons_column_data[x][0]) KeyError: 'Column 1' Abort trap: 6

mdfry commented 5 years ago

The issue turned out to be one where. when 2 corpora were created sequentially, the consColumns of the first corpora were added to as opposed to replaced.