Wordseer / wordseer

The WordSeer text analysis tool, written in Flask.
http://wordseer.berkeley.edu/
40 stars 16 forks source link

Parenthesis #152

Closed keien closed 10 years ago

keien commented 10 years ago

The parser turns parenthesis into -RRB- and -LRB-. I assume similar things happen for other brackets, which might be a problem when it comes to reconstruction.

abendebury commented 10 years ago

As you can see from the live demo (try something like This is a sentence (this is one too).), -RRB- and -LRB- come from the java parser itself.

One option is to do a string replacement to replace those strings with ( and ) respectively.

keien commented 10 years ago

Inputting ()<>[]{} will show that everything aside from the angle braces gets remapped. Is there a way to figure out what else might get remapped?

abendebury commented 10 years ago

Yes, here are the remap rules: http://www.cis.upenn.edu/~treebank/tokenization.html

abendebury commented 10 years ago

Looks like there's a way to disable it. I've pushed a new version of stanford-corenlp-python to the repository, go ahead and install it/try it.

keien commented 10 years ago

Looks good.

abendebury commented 10 years ago

I'll push to pypi.