asyml / texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
https://asyml.io
Apache License 2.0
744 stars 118 forks source link

Fix #347: only create object UnicodeRegex when used. #349

Closed qinzzz closed 2 years ago

qinzzz commented 2 years ago

Only create a UnicodeRegex object when the function bleu_transformer_tokenize is called.

hunterhector commented 2 years ago

Let's:

  1. Include the issue number in message
  2. Ignore pylint for the specific line.
huzecong commented 2 years ago

Ignore pylint for the specific line.

Alternatively, you can use an lru_cache with size=1 to lazy-construct the regex objects:

@functools.lru_cache(1)
def _get_unicode_regex() -> UnicodeRegex:
    return UnicodeRegex()

def bleu_transformer_tokenize(...):
    uregex = _get_unicode_regex()
    ...

Also, it seems that the regex package supports the \p{...} Unicode properties regex syntax, and it's already a dependency. I haven't profiled it but it seems to me that compiling a regex string like that should be much faster, and we might not need to use the lazy-construct trick at all.

codecov[bot] commented 2 years ago

Codecov Report

Merging #349 (304398f) into master (d98c2c2) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #349   +/-   ##
=======================================
  Coverage   80.37%   80.37%           
=======================================
  Files         135      135           
  Lines       11243    11247    +4     
=======================================
+ Hits         9036     9040    +4     
  Misses       2207     2207           
Impacted Files Coverage Δ
texar/torch/evals/bleu_transformer.py 97.87% <100.00%> (+0.09%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d98c2c2...304398f. Read the comment docs.