dabeaz / curio

Good Curio!
Other
4.01k stars 240 forks source link

Timeout occurred, but was uncaught. #292

Closed skgbanga closed 4 years ago

skgbanga commented 5 years ago

Hello,

I have a piece of code that looks like this:

async def checker():
    ...

async def client(conn):
  c = await curio.spawn(check)
  while True:
    try:
      data = await curio.timeout_after(1, conn.recv(4096)
      ...
    except curio.TaskTimeout:
         if c.terminated:
             await c.join()
             break

Couple of things in this code:

  1. client is spawning another task called checker. client periodically checks the status of client and if it has been terminated, joins on the task and breaks. Is this the best way of checking on the status of a task when you yourself might be awaiting on something. (in this case client is awaiting on a socket object's recv function)

  2. The above code 'works'. I occassionally get the above WARNING:

curio.task:Task(id=6, name='client', state='RUNNING'). Timeout occurred, but was uncaught. Ignored.

What does this warning mean? Nothing bad seems to be happening, but I am curious why this is happening. Note that if I increase the timeout to 2secs, I don't see this WARNING anymore.

dabeaz commented 5 years ago

That warning message is caused when the code wrapped by a timeout executes successfully, but the amount of time it took was actually longer than the specified timeout. That's might sound a bit weird, but there are a few scenarios where it can happen.

  1. You're executing a blocking (non-async) operation. For example:
import time
try:
    async with timeout_after(1):
        time.sleep(100)
except TaskTimeout:
    print("Timed out")

Here, there's no way for Curio to interrupt time.sleep(100) because that's not an asynchronous call. When it completes, control will leave the async with block and Curio will notice that time has advanced far more into the future than it was expecting. So, you'll get the warning.

  1. You're executing computationally complex code that can't complete by the deadline. For example, maybe instead of sleeping, you decide to compute the 1000th fibonacci number or something.

  2. Random scheduling. It's always possible with a timeout that the operation could complete very very close to the timeout deadline (through randomness). It might be the case that control flow leaves the block and the elapsed time is just barely more than the specified timeout period. That will also cause a warning.

The seriousness of the warning really depends on what you're doing. If the the purpose of the timeout was to enforce a strict time guarantee in a real-time system, then the warning meanings that your code is violating that time contract. If the warning started showing up a lot, it could indicate some kind of performance tuning and/or scaling issue with your code. For example, maybe CPU resources are insufficient to handle the current load of work in light of your time restrictions. And, as noted, it could indicate a programming bug--for example executing uninterruptible blocking operations by mistake.

It occurs to me that the warning message itself could be a bit better in explaining what's happening. I'll look into that.

skgbanga commented 5 years ago

Thanks @dabeaz for the reply. In my case, I am only doing conn.recv(4096) in the timeout_after call which is an async blocking operation so 1 and 2 are out.

The culprit is 3 though. The above code is part of a server and client architecture, and checker is a heartbeat checker that client is sending heartbeats every 1 second. I just realized that if my timeout is also 1second, there is a possibility that conn.recv call get a heartbeat at the boundary of 1 second and gets busy in serving that that and miss the timeout boundary. (what you posited in point 3)

async code is hard.

thanks for the answer. Please feel free to close the question.

imrn commented 5 years ago
async with timeout_after(1):
    # time.sleep(100)  # Probability of firing that warning: p = 1
    print('')  # But p is still non-zero for this. i.e. on an overwhelmed cpu.
    # Welcome to asyncs. :)

Perhaps curio documentation needs a "First Things First" section describing the very basics of "async logic" in an ultra compact and sharp language.

dabeaz commented 5 years ago

I consider the situation in this bug report to be an interesting one. What is the "proper" course of action in code that specifies a timeout, the code runs successfully, but it actually takes longer than the specified timeout period? Do you throw a TimeoutError exception anyway and consider it a failure? Or do you let the code proceed with a warning? (The source code for curio has a comment about all of this being a potential point of discussion).

The initial implementation of issuing of a warning was really more of an experiment to see if the situation actually arises in real code. And if so, under what circumstances.

imrn commented 5 years ago

I consider the situation to be an interesting one but definitely not a bug. That's the way it is. Nature of the "async beast" should be well taught and known. You may add optional switches to timeout context. But describing the theory behind it, is still needed. And good luck while defining that for a newbie just looking for a 'simple' timeout. ;)

skgbanga commented 5 years ago

@dabeaz The situation definitely arises in a real code base, even the if the code base is doing everything right. (i.e. none of computing fibonacci or time.sleep stuff, but proper async stuff)

I agree with @imrn that it is definitely not a bug, and is very context dependent. As long as uses are aware of that this can happen, I think most people understand the 'limitations' of async.

I think the important thing is what you mentioned in your comment above:

The seriousness of the warning really depends on what you're doing. If the the purpose of the timeout was to enforce a strict time guarantee in a real-time system, then the warning meanings that your code is violating that time contract.

In my case, I just wanted to a periodic way of checking the status of checker. it doesn't matter if it is few micro/milli seconds here and there.

imrn commented 5 years ago

The situation definitely arises in a real code base, even the if the code base is doing everything right. (but proper async stuff)

You shouldn't have it for proper async stuff. But theoretically, every single line of sync code increase your chances of having it.

I think most people understand the 'limitations' of async.

I don't think it is specific to asyncs. And I think most people does not understand the nature of asyncs AND syncs -just like "Real Time Operating Systems". There is no ideal "RTOS" unless you have an absolute zero calculation time for any instruction.

At the end of the day, we are all trying to pack some instructions for a finite resource machine. Asyncs is trying to bring some order and fine resolution for time usage. Syncs pretends as if there is no such problem.

If you are unlucky enough, you'll hit by time congestion on both. Being clever on your side may bring 'some' difference. But that's all.