Closed keien closed 10 years ago
Hmm, no, I don't expect issues with that. We might as well do it that way since we're handling dependencies like that in the same file; it's less of a headache.
Is this still an issue?
Fixed in 1478aff214c83287b74925dc1424cf2400fe2d29
On my current version of
handling-duplicates
(604c4fef5c18723aecd24a3a953be37949a83cf9), there's a problem with the waystringprocessor.py
creates sentences and words, and the wayreaderwriter
handles duplicates.In the original
stringprocessor
,tokenize
returned aSentence
which already had a list of words associated with itself, but this was not ideal because theword_in_sentence
association objects didn't have its extra fields (position, tag, etc.), and updating them on the fly was a hassle. The other option would be to remove the original words from the sentence and create new words, but that doesn't make much sense.I tried changing the end of
tokenize_from_raw
to usesentence.add_word
, but this meant that I would have to move the duplication handling intostringprocessor
, which I don't want to do. So now I just have it pass in the list of words throughsentence.tagged_words
, which is not ideal, but not terrible either. However, I now get this error, which I have no idea what to do about:I'm think it'd be easier if we change the
tokenize_from_raw
implementation to just return a dictionary, like the way we handled dependencies. Do you think there'd be issues with that?