CLUEbenchmark / CLUE

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
http://www.CLUEbenchmarks.com
4k stars 540 forks source link

TensorArray Not Used on line 865 of tokenization_utils.py #170

Open CodeSmileBot opened 1 year ago

CodeSmileBot commented 1 year ago

Hello!

I found an AI-Specific Code smell in your project. The smell is called: TensorArray Not Used

You can find more information about it in this paper: https://dl.acm.org/doi/abs/10.1145/3522664.3528620.

According to the paper, the smell is described as follows:

Problem If the developer initializes an array using tf.constant() and tries to assign a new value to it in the loop to keep it growing, the code will run into an error. The developer can fix this error by the low-level tf.while_loop() API. However, it is inefficient coding in this way. A lot of intermediate tensors are built in this process.
Solution Using tf.TensorArray() for growing array in the loop is a better solution for this kind of problem in TensorFlow 2.
Impact Efficiency, Error-proneness

Example:

### TensorFlow
import tensorflow as tf
@tf.function
def fibonacci(n):
    a = tf.constant(1)
    b = tf.constant(1)
-    c = tf.constant([1, 1])
+    c = tf.TensorArray(tf.int32, n)
+    c = c.write(0, a)
+    c = c.write(1, b)

    for i in range(2, n):
        a, b = b, a + b
-       c = tf.concat([c, [b]], 0)
+       c = c.write(i, b)

-    return c
+    return c.stack()

You can find the code related to this smell in this link: https://github.com/CLUEbenchmark/CLUE/blob/2ea90461e0a0321945f880330b629ce09e0e3fd2/baselines/models_pytorch/classifier_pytorch/transformers/tokenization_utils.py#L855-L875.

I also found instances of this smell in other files, such as:

File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/bert/optimization_test.py#L26-L36 Line: 31 File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/bert_wwm_ext/optimization_test.py#L26-L36 Line: 31 File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/ernie/optimization_test.py#L26-L36 Line: 31 File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/roberta_wwm_ext/optimization_test.py#L26-L36 Line: 31 File: https://github.com/CLUEbenchmark/CLUE/blob/master/baselines/models/roberta_wwm_large_ext/optimization_test.py#L26-L36 Line: 31 .

I hope this information is helpful!