asyml / texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
https://asyml.io
Apache License 2.0
745 stars 117 forks source link

Unused objects (UnicodeRegex) are created when importing texar pytorch. #347

Closed hunterhector closed 2 years ago

hunterhector commented 3 years ago

I found that the bleu_transformer class is called simply when texar-pytorch is imported, which is likely due to the following line:

https://github.com/asyml/texar-pytorch/blob/507932c899ca3a8663479b31efc3a41bc7180693/texar/torch/evals/bleu_transformer.py#L159

The instance creation is done at the module level. And I didn't find the usage of this instance in other places. I guess the reason of this statement is to avoid creating a uregex every time (which iterate over the whole unicode space). But having this variable here slow down the texar import.

A simple test of time (start=timeit.default_timer(); import texar.torch; print(timeit.default_timer()-start))

  1. with uregex: 6.359622538
  2. without: 3.2298435450000014

Thus this line somehow double the import speed.