Open sxjscience opened 3 years ago
However, accents may have certain meanings for lots of languages, e.g., mochte vs. möchte. Thus, we may try to turn it off in nlp_process.
Thus, we may try to turn it off in nlp_process.
Do you mean exposing an option in nlp_process
or changing the defaults in nlp_process
? As English is a special case that doesn't care much about accents, I suggest we must keep the option to keep accents in nlp_process
.
Yes, I mean to always explicitly set “strip_accent” to False in nlp_process.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Leonard Lausen notifications@github.com Sent: Monday, February 22, 2021 7:37:12 AM To: dmlc/gluon-nlp gluon-nlp@noreply.github.com Cc: Xingjian SHI xshiab@connect.ust.hk; Author author@noreply.github.com Subject: Re: [dmlc/gluon-nlp] strip_accents should be None by default in WordPiece (#1528)
Thus, we may try to turn it off in nlp_process.
Do you mean exposing an option in nlp_process or changing the defaults in nlp_process? As English is a special case that doesn't care much about accents, I suggest we must keep the option to keep accents in nlp_process.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dmlc/gluon-nlp/issues/1528#issuecomment-783461555, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHQH3VYVYGTNOKCD23YH2LTAJ22RANCNFSM4X76IM4Q.
Description
@leezu @szha @xinyual I noticed that we may need to set
strip_accents
to None in https://github.com/dmlc/gluon-nlp/blob/223f1f6f8e267d258abd2f299ec6fc4a9b2f1cf8/src/gluonnlp/data/tokenizers/huggingface.py#L564 so that it will be turned on whenlowercase
is True.This may impact the performance.
Error Message
(Paste the complete error message, including stack trace.)
To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
1. 2.
What have you tried to solve it?
1. 2.
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below: