Closed gryBox closed 5 years ago
Oof, thanks for the heads-up. When I scrapped fix_bad_unicode()
, I accidentally wrote return NotImplementedError
rather than raise NotImplementedError
— will fix.
In this function, whitespace is always normalized, even if everything else is False. Is that surprising/undesirable behavior?
IMO: It should be explicit. i.e fix_unicode=False... If it;s worth the discussion for you, my question is what is the rational the other way?
I'm not sure I follow... Normally, I'd mark a function as deprecated then leave it as-is for a bit, but since I yanked this function out suddenly, I left it in (albeit with a bug) so that folks would see the NotImplementedError
and message explaining why the functionality is gone and how to reproduce it on their own.
Yeah - I misunderstood your question. Still do. Can you rip it out completely? fix_unicode
is just not part of textacy anymore. I would leave the Warning:
steps to reproduce
some_string = "'A chemical combination brought about by the action of light, as in the formation of carbohydrates in living plants from the carbon di-oxid and water of the air under the influence of sunlight."
Scenario 1
Result:
Should
fix_unicode
be removed since it is no longer supported by textacy directly?Scenario 2 (all false)
Result:
'A chemical combination brought about by the action of light, as in the formation of carbohydrates in living plants from the carbon di-oxid and water of the air under the influence of sunlight.'
expected vs. actual behavior
"'a chemical combination brought about by the action of light as in the formation of carbohydrates in living plants from the carbon di oxid and water of the air under the influence of sunlight"
I know preprocess worked in 0.6.x
environment
spacy
version: 2.1.4spacy
models: en_core_web_smtextacy
version: 0.7.0