2captcha / 2captcha-python

Python 3 package for easy integration with the API of 2captcha captcha solving service to bypass recaptcha, hcaptcha, сloudflare turnstile, funcaptcha, geetest and solve any other captchas.
https://pypi.org/project/2captcha-python
MIT License
550 stars 92 forks source link

Optimised imports - api.py #91

Closed arthurauffray closed 4 months ago

arthurauffray commented 4 months ago
plutaniano commented 4 months ago

The impact on memory is 0, the entire module is always imported, regardless if you use from ... import ... or import ....

>>> import sys
>>> from requests import get
>>> print(dir(sys.modules["requests"]))
['ConnectTimeout', 'ConnectionError', 'DependencyWarning', 'FileModeWarning', 'HTTPError', 'JSONDecodeError', 'NullHandler', 'PreparedRequest', 'ReadTimeout', 'Request', 'RequestException', 'RequestsDependencyWarning', 'Response', 'Session', 'Timeout', 'TooManyRedirects', 'URLRequired', '__author__', '__author_email__', '__build__', '__builtins__', '__cached__', '__cake__', '__copyright__', '__description__', '__doc__', '__file__', '__license__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__title__', '__url__', '__version__', '_check_cryptography', '_internal_utils', 'adapters', 'api', 'auth', 'certs', 'chardet_version', 'charset_normalizer_version', 'check_compatibility', 'codes', 'compat', 'cookies', 'delete', 'exceptions', 'get', 'head', 'hooks', 'logging', 'models', 'options', 'packages', 'patch', 'post', 'put', 'request', 'session', 'sessions', 'ssl', 'status_codes', 'structures', 'urllib3', 'utils', 'warnings']

The impact on speed is ~0, the only difference is an extra name lookup because of the dotted access.

arthurauffray commented 4 months ago

Hey Plutaniano, thanks for your comment; it got me to look into this more. I did some testing with some quick copilot-generated code:

Specific import:

from time import sleep
sleep(0.5)

Module import:

import time
time.sleep(0.5)

Runner code:

import time

def run():
    total_time = 0
    for _ in range(100):
        start_time = time.time()
        exec(open('./speed/1.py').read())
        end_time = time.time()
        execution_time = end_time - start_time
        total_time += execution_time
    average_time = total_time / 100
    print(f"Specific import: {average_time} seconds")

    total_time = 0
    for _ in range(100):
        start_time = time.time()
        exec(open('./speed/2.py').read())
        end_time = time.time()
        execution_time = end_time - start_time
        total_time += execution_time
    average_time = total_time / 100
    print(f"Module import: {average_time} seconds")

for _ in range(5):
    run()

Here are the results (each pair of results runs each script 100 times for a total of 1000 executions):

Specific import: 0.5038385605812072 seconds
Module import: 0.5042011451721191 seconds

Specific import: 0.503937861919403 seconds
Module import: 0.5037545895576477 seconds

Specific import: 0.5042110896110534 seconds
Module import: 0.5042633104324341 seconds

Specific import: 0.5043397688865662 seconds
Module import: 0.5043723654747009 seconds

Specific import: 0.504006495475769 seconds
Module import: 0.504084300994873 seconds

Average Specific import time: 0.50406675529 seconds Average Module import time : 0.50422445344 seconds

As for the memory usage, it is a fair point as it does still import the entire module; I think this could be changed 🀣.

plutaniano commented 4 months ago

Your methodology for measuring is not ideal. Time fluctuations in the execution of time.time, time.sleep, open, .read() (because of CPU scheduling and disk access) will overpower the nanosecond increase caused an attribute lookup.

from x import y is faster, you can just count the instructions below, but the difference is so small it doesn't matter. This library will routinely make API calls that take multiple seconds. A nanosecond increase is literally undetectable.

The impact on legibility is not worth it.

In [1]: def just_import():
   ...:     import requests
   ...:     requests.get
   ...:

In [2]: def from_import():
   ...:     from requests import get
   ...:     get
   ...:

In [3]: import dis

In [4]: dis.dis(from_import)
  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (0)
              4 LOAD_CONST               2 (('get',))
              6 IMPORT_NAME              0 (requests)
              8 IMPORT_FROM              1 (get)
             10 STORE_FAST               0 (get)
             12 POP_TOP

  3          14 LOAD_FAST                0 (get)
             16 POP_TOP
             18 LOAD_CONST               0 (None)
             20 RETURN_VALUE

In [5]: dis.dis(just_import)
  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (0)
              4 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (requests)
              8 STORE_FAST               0 (requests)

  3          10 LOAD_FAST                0 (requests)
             12 LOAD_ATTR                1 (get)    # this is the extra instruction
             22 POP_TOP
             24 LOAD_CONST               0 (None)
             26 RETURN_VALUE
ooojustin commented 4 months ago

The idea of optimizing a python program like this, regardless of whether or not it's in a web API wrapper, is inherently very silly.

poplers24 commented 4 months ago

@kratzky There are also edits to imports and corrected method calls in the code. I ran the tests, it works correctly. Can you merge