dmcc / PyStanfordDependencies

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies
https://pypi.python.org/pypi/PyStanfordDependencies
68 stars 17 forks source link

Handle PTB trees with Unicode words in them #8

Open dmcc opened 9 years ago

dmcc commented 9 years ago

Either by fixing the encoding issues or temporarily replacing them with dummy ASCII words.

Thanks to Karin M. Sim Smith for the report.

Temporary workaround: If possible, don't pass trees with Unicode words in them. This should be safe since Stanford Dependencies generally don't care about the words in the trees and the few words that it does care about are in ASCII.