Closed gkunter closed 8 years ago
Original comment by gkunter (Bitbucket: gkunter, GitHub: gkunter):
#!python
corpusbuilder.py, line 2086: build
corpusbuilder.py, line 1441: build_load_files
corpusbuilder.py, line 1421: process_file
corpusbuilder.py, line 1162: process_text_file
corpusbuilder.py, line 1196: add_token
corpusbuilder.py, line 446: get_or_insert
corpusbuilder.py, line 421: add
> self._add_cache[tuple([row[x] for x in self._row_order])] = (self._current_id, row)
Originally reported by: gkunter (Bitbucket: gkunter, GitHub: gkunter)
The corpus builder is broken. If POS tagging is turned on, the build is slow, indicating that the NLTK methods are indeed being used. However, the resulting corpus module doesn't seem to have POS activated.
If POS tagging is turned off, a KeyError(?) excpetion is raised.