Wordseer / wordseer

The WordSeer text analysis tool, written in Flask.
http://wordseer.berkeley.edu/
40 stars 16 forks source link

Weird bug on much_ado #110

Closed keien closed 10 years ago

keien commented 10 years ago

There's a weird bug that occurs when running on much_ado.xml.

Traceback (most recent call last):
  File "run_pipeline.py", line 18, in <module>
    collection_processor.process(collection_dir, structure_file, extension, False)
  File "/home/keien/dev/wordseer_flask/lib/wordseerbackend/wordseerbackend/collectionprocessor.py", line 61, in process
    self.parse_documents()
  File "/home/keien/dev/wordseer_flask/lib/wordseerbackend/wordseerbackend/collectionprocessor.py", line 175, in parse_documents
    document_parser.parse_document(doc)
  File "/home/keien/dev/wordseer_flask/lib/wordseerbackend/wordseerbackend/parser/documentparser.py", line 52, in parse_document
    parsed = self.parser.parse(sentence, relationships, dependencies)
  File "/home/keien/dev/wordseer_flask/lib/wordseerbackend/wordseerbackend/stringprocessor.py", line 77, in parse
    if int(dependency[2]) > 0 and int(dependency[4]) > 0:
ValueError: invalid literal for int() with base 10: "3'"
abendebury commented 10 years ago

Interesting... looks like one of those dependency values is 3' instead of 3. Does this only happen on much_ado.xml?

abendebury commented 10 years ago

Is this much_ado.xml in shakespeare or shakespeare_mini?

abendebury commented 10 years ago

err... those files are identical, and so are the structure files. Why do we have two directories?

keien commented 10 years ago

which files are you talking about?

abendebury commented 10 years ago

shakespeare/much_ado.xml and shakespeare_mini/much_ado.xml.

keien commented 10 years ago

shakespeare_mini is just so I can run a couple of documents, but not have to wait for everything to run.

abendebury commented 10 years ago

Oh, I see.

keien commented 10 years ago

Any leads on this bug?

abendebury commented 10 years ago

Yes, it happens on this line:

https://github.com/Wordseer/wordseer_flask/blob/master/tests/data/shakespeare/much_ado.xml#L1805

Still working on why.

abendebury commented 10 years ago

Looks like the output of the stanford library puts a quote where it shouldn't be. We can just strip all non-numeric characters from the indices, should fix it.

abendebury commented 10 years ago

Pushed a new version to the repository, go ahead and try.

abendebury commented 10 years ago

Did the new version fix this?

keien commented 10 years ago

I still get this bug with the new version. Which branch did you push it to?

abendebury commented 10 years ago

I pushed a new version of the interface to stanford corenlp to master.

keien commented 10 years ago

so rerunning install.sh should update it, right?

keien commented 10 years ago

I reran install.sh and I still get the error. Do you get it on your side?

abendebury commented 10 years ago

install.sh doesn't update. pip install --upgrade stanford-corenlp-python

keien commented 10 years ago

The run was successful, but does raise this error:

End Of File (EOF). Exception style platform.
app.preprocessor.stringprocessor:ERROR:  Unknown error

which means that it was none of the errors defined in the corenlp interface.

abendebury commented 10 years ago

Try again. Looks like there was no newline at the end of the file.

keien commented 10 years ago

Finished run, looks good.