aliyun / aliyun-openapi-python-sdk

Alibaba Cloud SDK for Python
Other
1k stars 588 forks source link

Concerns about new get_uuid() implementation in aliyun-python-sdk-core 2.14.0 #516

Closed taoky closed 8 months ago

taoky commented 9 months ago

In https://github.com/aliyun/aliyun-openapi-python-sdk/commit/7e03ac97ebbc421552ac47f2a8db85f2ecebb787, get_uuid() in aliyun-openapi-python-sdk/aliyun-python-sdk-core/aliyunsdkcore/utils/parameter_helper.py changes from:

def get_uuid():
    name = socket.gethostname() + str(uuid.uuid1())
    namespace = uuid.NAMESPACE_URL
    return str(uuid.uuid5(namespace, name))

to:

_process_start_time = int(time.time() * 1000)
_seqId = 0

def get_uuid():
    global _seqId
    thread_id = threading.current_thread().ident
    current_time = int(time.time() * 1000)
    seq = _seqId
    _seqId += 1
    randNum = random.getrandbits(64)
    msg = '%d-%d-%d-%d-%d' % (_process_start_time, thread_id, current_time, seq, randNum)
    md5 = hashlib.md5()
    md5.update(msg.encode('utf-8'))
    return md5.hexdigest()

However, to me this change sounds very weird as:

  1. The value new get_uuid() returns is no longer a UUID (8-4-4-4-12 hexadecimal digits).
  2. _seqId += 1 is not thread-safe if multi thread are calling get_uuid() (this increases the possibility of collisions).
  3. The possibility of "UUID" collisions also increases when the program is restarting frequently (as time.time() is used as a part of generation here).

It looks like a better idea to use Python's built-in uuid module instead of reinventing the wheel. It would be very appreciated if you (@h-hg as the author of this commit, I think) could explain this change. Thank you.

h-hg commented 9 months ago
  1. This function is used to generate a unique nonce value for the openapi call, and we don't limit the format to satisfy the uuid. For the historical reasons, the function name may be confusing to readers.
  2. This is a Python2 repository. Due to python GIL, only one thread can be run. If this GIL assumption does not hold, the randNum field can be used to reduce the collision rate.
  3. We have tested the this design in tea-util in the real environment, we found that the collisions has been significantly reduced.
taoky commented 9 months ago

Thank you for referring https://github.com/aliyun/tea-util/pull/335 and the test results in production environments, though there's a minor issue with Python implementation (compared with Java/Golang implementation of same algorithm):

Due to python GIL, only one thread can be run.

A += 1 in Python would be compiled to multiple bytecodes (load, += and store):

>>> n = 0
>>> def f():
...  global n
...  n += 1
... 
>>> import dis
>>> dis.dis(f)
  1           0 RESUME                   0

  3           2 LOAD_GLOBAL              0 (n)
             14 LOAD_CONST               1 (1)
             16 BINARY_OP               13 (+=)
             20 STORE_GLOBAL             0 (n)
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

And in Python 3, GIL would be taken by another thread if it has timed out (https://github.com/python/cpython/blob/65c285062ce2769249610348636d3d73153e0144/Python/ceval_gil.c#L20-L24C58. GIL is also preemptive in Python 2.x), thus n here is not an atomic variable. And unfortunately Python does not have built-in modules about atomics...

JacksonTian commented 8 months ago

感谢 @taoky 对这个改动的关注。作为生成 Nonce 的实现,目前看下来还是 OK 的。这个讨论就暂时先关掉了。

如果还有其余问题,请 re-open。