Lynten / stanford-corenlp

Python wrapper for Stanford CoreNLP.
MIT License
922 stars 200 forks source link

Setting properties to the tokenizer #91

Open CatarinaPC opened 4 years ago

CatarinaPC commented 4 years ago

Hello

I was using this wrapper to perform tokenization over French sentences.

I also set properties according to the CoreNLP page, the Stanford Tokenizer page and the README here on this repository. However, the properties set in the 'tokenize.options' are having no effect. Is this the way to set properties to the tokenizer?

The code:

nlp = StanfordCoreNLP(r'../libraries/stanford-corenlp-full-2018-10-05', lang='fr')`

props = {'annotators': 'tokenize', 
         'pipelineLanguage': 'fr', 
         'outputFormat': 'text', 
         'tokenize.options': 
           'strictTreebank3=false, '
           'untokenizable=allkeep, '
           'escapeForwardSlashAsterisk=false, '
           'normalizeFractions=false, '
           'normalizeAmpersandEntity=false, '
           'invertible=true, '
           'asciiQuotes = false, '
           'latexQuotes=false, '
           'unicodeQuotes=false,  '
           'normalizeOtherBrackets=false, '
           'ptb3Dashes=false, '
           'americanize=false, '
           'normalizeAmpersandEntity=false, '
           'normalizeFractions=false, '
           'normalizeParentheses=false, '
           'normalizeOtherBrackets=false,'
           'ptb3Ellipsis=false, '
           'unicodeEllipsis=false'}

example_sentence = "Maria grandit au sein d'une famille de l'ancienne «bourgeoisie»."

nlp.annotate(example_sentence, properties=props)