Aiohttp latency deteriorating when called from xloil

ShieldTrade commented 1 year ago

aiohttpVersion: 3.8.1 The following python code request first pokemon name and measure latency.

When called from python Average latency is usually 55 mls

when called from XLL using xloillatency deteriorates and becomes around 200 mls

I am not really interested in pokemons :-) but I noticed the problem sending orders to crypto exchanges using CCXT. CCXTunderlying networking lib is aiohttp.

Unfortunately I still have some problem as in issue 42 therefore I cannot use Xloil to write to excel. I just discovered github does not support xlsx and I do not know how I can send you the excel file

D4 =IFERROR(py_start_latency_test(C4),"Test is running")
D7 =IFERROR(py_get_latency_new(C7),"Test is running or waiting to start")
D7 select and copy and past to D8:D16
D18  =IFERROR( AVERAGE(D7:D16), "Test is running or waiting to start")

So far I managed to use xloil to to build a very nice and useful back end. I can now monitor my algos and I did it just using async function. I am looking forward to explore further the potential offered by your library. Latency is essential for my trading logic so I really need help on this.

Thank you


import asyncio
import xloil as xlo
import aiohttp
import time
results = []
elapsed = 0
num_orders = 10
print('aiohttp Version:', aiohttp.__version__)

""" request first pokemon name and measure latency. Can call it from XLL/xloil"""

# RuntimeError: Event loop is closed.  The error typically causes only a noisy shutdown
# https://github.com/Azure/azure-sdk-for-python/issues/9060
# https://github.com/aio-libs/aiohttp/issues/4324

async def main():
    global results
    global elapsed
    results = []
    elapsed = 0
    async with aiohttp.ClientSession() as session:
        pokemon_url = f'https://pokeapi.co/api/v2/pokemon/{1}'
        for i in range(0, num_orders):
            start = time.time_ns() // 1000000
            async with session.get(pokemon_url) as resp:
                pokemon = await resp.json()
                print(i, pokemon['name'])
                end = time.time_ns() // 1000000
                elapsed = int(end - start)
                results.append(elapsed)
                print(elapsed, 'ms')
                await asyncio.sleep(1)
        s = sum(results)
        average = s / len(results)
        print('Average:', average, 'ms')

@xlo.func
async def py_start_latency_test(on_off):
    if on_off == 1:
        await main()
        return 'back to 0 then 1 for another test'
    else:
        return '1 to start the test'

@xlo.func
async def py_get_latency_new(num):
    num = int(num)
    while True:
        if len(results) < num_orders:
            yield 'Test is running or waiting to start'
        else:
            yield results[num - 1]
        await asyncio.sleep(2)

# uncomment To run from python
# asyncio.run(main())

cunnane commented 1 year ago

I'll try to replicate my side. Just to confirm are you seeing the latency issue when running just 1 RTD function in a sheet containing nothing else? If you want to run highly responsive things in Excel you should be aware that RTD functions are throttled although you can change this interval it will put pressure on the GUI/main thread

ShieldTrade commented 1 year ago

Thank you a lot for helping me with this issue.

are you seeing the latency issue when running just 1 RTD in a sheet containing nothing else?

yes.

you should be aware that RTD functions are throttled

I changed the interval Application.RTD.ThrottleInterval = 0 before running my latency tests

While further investigating this issue I found.

For the above script. Latency if called from python around 50 Mls ( PY lat ) Latency if called from excel around 200 Mls ( XLL lat )

1) adding await asyncio.sleep(0.001) after print(i, pokemon['name']) adds another 200 mls latency XLL lat = 400. PY lat = 50

2) adding await asyncio.sleep(0) after print(i, pokemon['name']) ( same position as 1 ) does not add any further latency. XLL lat = 200. PY_lat = 50

3) adding await asyncio.sleep(0.001) after end = time.time_ns() // 1000000 does not add any further latency. XLL lat = 200. PY_lat = 50

I run a full microsoft defender scan and I turned off all firewalls and virus protection before running all my test

cunnane commented 1 year ago

Just so you don't spend any more time on this: the 200ms is baked into xlOil - it interrupts the asyncio thread every 200ms to stop it hogging the python GIL. I'll make this configurable, so assume it will get resolved soon! Thank you for investigating!

ShieldTrade commented 1 year ago

Thank you a lot.

please give me a quick fix as soon as you have it. I can try and debug it for you. Due to the complexity of xloil I understand it might take a while for you to implement it in and release a new official version.

cunnane commented 1 year ago

I've just released v0.17.9. The parameters you are looking for are: xlo.core_addin().async_throttle and xlo.core_addin().async_slice which specify in milliseconds the time between runs of asyncio and the time for which the event loop can run

ShieldTrade commented 1 year ago

Excellent.

ShieldTrade commented 1 year ago

I made a few tests I would like to share with you.

Goal.

Compare aiohttp latency (Lat) when called from py or XLoil using different xlo.core_addin().async_throttle

Methodology.

Every test is composed by 30 requests. Fore each request Lat in mls is measured.
For every run 10 tests are made.
15 runs are made using both python and Xloil changing xlo.core_addin().async_throttle to 0, 1, 5, 10. For each run, test average (Av) and Lat differences (py and Xloil) are measured ( tab 1).
15 runs are made using Xloil changing xlo.core_addin().async_throttle to 100, 200, 400, 1000. For each run, test Av and Thr time differences are measured ( tab 2).
8 runs are made with different Thr time using the previous post methodology in order to investigate Lat upon first request. ( tab 3).
a few test are made with Thr time = 200 adding await asyncio.sleep(0.001) in different part of the code in order to understand when the Thr time is triggered (tab 4).
1 Xloil run shortly follows 1 py run ( i.e. run1Xloil starts 14:50 ends 14:53 followed by run1Py starts 14:54 ends 14:57).

General consideration.

Network Lat and server congestion can change for Xloil and Py within the same run, even if they follow shortly. Given the amount of runs the Network Lat and server congestion should statistically compensate; therefore, in my opinion, cannot be considered the cause of differences ( but not so sure. 15 runs might not be enough).

Code.

For XLL you can changepy_get_latency_new(C4) with py_get_average (C4). See attached .py

My Conclusion.

TABLE 1, GRAPH 1. thr time < Lat

Best Xloil latency are obtained with throttle (Thr) time 0 with an average difference of 1 . This causes CPU usage to jump to 30 % on my machine.
With Thr time 1 I have a run Av difference of 11.39 . I would expect 1 mls latency added on av. ?
With Thr time 5 and 10 the Av difference is close to Thr time 1. this makes sense because Network Lat is > Thr time
Std is constant and pretty high for all Thr time. Network Lat can explain this.
Runs might not be enough to compensate Network Lat differences, especially when test are conducted during network peak time.

TABLE 2 GRAPH 2. thr time > Lat.

Std is constant and pretty low for all Thr time.
Av is respectively 15, 20, 20, 31, , a better look at Tab 2 shows that the first test for each run ( when the persistent connection is established upon the first request ) can explain the Av differences. Therefore a few test were conducted using the previous post methodology.

TABLE 3 As expected the first aiohttprequest is causing the problem. There is probably something in the library triggering the Thr time when the first connection is established.

TABLE 4 As already noted in the previous post only position 1,2,3,4 triggers Thr time. I do not understand why 5 and 6 do not. ? Red value are unexpected. ?

TBL1

Graph1

Tbl2 Graph2

Tbl3 Tbl4

import asyncio
import xloil as xlo
import aiohttp
import time
results = []
av = []
num_orders = 30
num_test = 1
print('aiohttp Version:', aiohttp.__version__)

""" request first pokemon name and measure latency. Can call it from XLL/xloil
1 test = 30 (num_orders) requests.
Test is run 10 (num_test) times .
Every time this code is run from py, a Py column  is generated.
Every time this code is run from Xloil, a Oil column is generated"""

# RuntimeError: Event loop is closed.  The error typically causes only a noisy shutdown
# https://github.com/Azure/azure-sdk-for-python/issues/9060
# https://github.com/aio-libs/aiohttp/issues/4324

async def main():
    global results
    global av
    av = []
    async with aiohttp.ClientSession() as session:
        pokemon_url = f'https://pokeapi.co/api/v2/pokemon/{1}'
        for t in range (0, num_test):
            results = []
            for i in range(0, num_orders):
                start = time.time_ns() // 1000000
                async with session.get(pokemon_url) as resp:
                    # await asyncio.sleep(0)
                    pokemon = await resp.json()  # thr slp.
                    await asyncio.sleep(0.001)  # 1 thr slp. on/off
                    await asyncio.sleep(0.001)  # 2 thr slp. on/off
                    await asyncio.sleep(0.001)  # 3 thr slp. on/off
                    await asyncio.sleep(0.001) # 4 thr slp. on/off
                    end = time.time_ns() // 1000000
                    await asyncio.sleep(0.001)  # 5 thr no slp. on/off
                    elapsed = int(end - start)
                    results.append(elapsed)
                    await asyncio.sleep(0.001)  # 6 no slp. on/off
                    print('test', t, 'request', i, 'name', pokemon['name'], 'latency', elapsed, 'ms')
                    await asyncio.sleep(0.1)
            s = sum(results)
            average = round(s / len(results))
            print('\n **********')
            print('test', t, 'average latency', average, 'ms')
            print(' **********\n')
            av.append(average)
            await asyncio.sleep(5)
        print('All test Average:', av)

@xlo.func
async def py_start_latency_test(on_off):
    if on_off == 1:
        await main()
        return 'back to 0 then 1 for another test'
    else:
        return '1 to start the test'

@xlo.func
async def py_get_latency_new(num):
    num = int(num)
    while True:
        if len(results) < num_orders::
            yield 'Test is running or waiting to start'
        else:
            yield results[num - 1]
        await asyncio.sleep(2)

@xlo.func
async def py_get_average(num):
    num = int(num)
    while True:
        if len(av) < num_test:
            yield 'Test is running or waiting to start'
        else:
            yield av[num - 1]
        await asyncio.sleep(2)

# uncomment To run from python change thr time here
xlo.core_addin().async_throttle = 200
# asyncio.run(main())

cunnane commented 1 year ago

Wow, this is very detailed analysis! I'm thinking that a fundamental limitation is the contention in xlOil: Excel's main thread is required to do most things, the asyncio loop which services aiohttp does run in the background, but it needs to be suspended so that Excel can run the python parts of the RTD mechanism.

A way to avoid this contention would be to run an RTD server in a separate process (see the example here), then query this with =RTD(...) in Excel. xlOil can't do much to help in this case, but if the approach give some useful improvement, I could enhance xlOil by exposing the code which takes away the RTD boilerplate and COM registration headaches.

Do you want the RTD server to return just single values or more complex python objects? If the latter, then it's possible to call RTD from within a python UDF to unpack (e.g. with pickle) some special string returned by the RTD server.

ShieldTrade commented 1 year ago

Hi @cunnane ,

Update.

This causes CPU usage to jump to 30 % on my machine.

increasing the Application.RTD.ThrottleInterval reduced the CPU usage.

My final goal is to use XLL as a front end to build a graphical user interface (GUI). The GUI should not add any Latency (Lat) to the PY process which is 1) get data in real time using ws. 2)analyze data, 3) If needed send Http message. If some Lat is added I would to measure it. PY owns all the relevant variable (Var) and logic and does not need any Xll based calculation to run correctly. If it is the Python asyncio loop that

needs to be suspended so that Excel can run the python parts of the RTD mechanism.

is causing an increase of latency, I would like to figure out when it happens so that I can better understand how Excel's main thread is interacting with PY asyncio loop. We can call it Thr Lat Effect as analyzed for a subpart of my code. Ideally I would like to measure theThr Lat Effect for my complex PY code instead of the code provided in this post : My code is obviously using other lib, which are using other lib, which might all trigger a delay. I think this question of mine could be a good starting point for investigating

As already noted in the previous post only position 1,2,3,4 triggers Thr time. I do not understand why 5 and 6 do not. ? Red value are unexpected. ?

If I had the Python debug my live would probably be easier :-)

Honestly I am quite happy with the results I obtained so far. PY works as expected and the only Lat I am measuring at the moment, Http Lat, is back to normal values. Using Thr of 1 ML and Application.RTD.ThrottleInterval = 1000 ( 1 sec) I have a good compromise between CPU usage and user experience.

If Http low Lat should be the case ( reducing network part of the Lat) I would probably suffer more the Thr Lat Effect total delay effect.

cunnane commented 1 year ago

The relevant part of xlOil is xloil.importer._pump_message_loop which calls asyncio.run_until_complete with a timeout. This is invoked from xlOil with the following (pseudo) C code:

while (!_stopped)
{
  {
    acquire_gil();
    call("_pump_message_loop", timeout * (1 if num_running_tasks > 0 else 0.25));
    release_gil();
  }
  // Take a break to allow other GIL users a chance
  Sleep(num_running_tasks > 0 ? sleep_time : sleep_time * 4);
}

The timeout and sleep_timeare the theasync_throttleandasync_slice` for which I added the config options

If you were to run an external COM server in python it would remove the need to interrupt the asyncio loop.

Does python debugging not work for you?

ShieldTrade commented 1 year ago

Thank you,

Py concurrency is a difficult issue. I have seen engineers ( I am a simple trader with a bit of coding knowledge) much more skilled then me making mistakes. In my Py code the event loop is not supposed to be interrupted and I am investigating the consequences. (latency, data lost, future canceled ecc..)

I gave a look at server It looks quite difficult to implement for my skills but I understand this would eliminate the problem.

cunnane commented 1 year ago

Agree the concurrency in xlOil took a lot of time to develop cleanly! (Or at least as clean as it currently is!)

I had a look a what would be required to allow xlOil to create an external COM server for you but it turned out to be a bit more tricky than I'd hoped to start an out-of-process server. I will give this a try and let you know if I make some progress.

ShieldTrade commented 1 year ago

Thank you

cunnane / xloil