GoogleCloudPlatform / cloud-profiler-python

Stackdriver Profiler Python agent is a tool that continuously gathers CPU usage information from Python applications
Apache License 2.0
28 stars 23 forks source link

CPU profiling with gevent ? #54

Closed hiimdoublej-swag closed 2 years ago

hiimdoublej-swag commented 4 years ago

Dear maintainers: I have a celery application that processes asynchronous tasks. I ran the worker with -P gevent option and it would hang for 10 seconds every chunk of time with the profiler enabled. The profiler was initialized before the celery application comes up. When I set verbose=3 on googlecloudprofiler.start(), I observed that the hiccup would happen between Successfully created a CPU profile and Starting to upload profile. My configuration works with any of the following actions conducted:

Would it be related to the GIL ? Can some of you guys take a look at it ? If there's not enough information please feel free to ask me for it. Thanks.

kalyanac commented 4 years ago

Thank you for reporting this. This seems to be caused by a conflict in signal handling in the Profiler agent and gevent/greenlet.

jqll commented 4 years ago

Thanks for reporting this. Could you share some code and the setup that we can use to reproduce the error?

hiimdoublej-swag commented 4 years ago

Thanks for reporting this. Could you share some code and the setup that we can use to reproduce the error?

Sure, I'll follow up with some basic setups.

hiimdoublej-swag commented 4 years ago

Dear @jqll : I've put together a minimal setup to reproduce the problem, please have a look here and let me know if there's anything I can provide to help you guys diagnose the issue. Thanks.

jqll commented 4 years ago

@hiimdoublej-swag thank you! We will take a look and get back to you.

hiimdoublej-swag commented 4 years ago

gentle ping :)

jqll commented 4 years ago

Hi @hiimdoublej-swag, thanks for pinging. Sorry that I was kept busy by something else and just get some time to look into this.

To give some context, the profiler starts a daemon thread that continuously collects and uploads profiles. When it collects the CPU profile, it calls into a long running C function which collects the profile with low overhead.

The problem is that the celery app with gevent option seems can't handle tasks when another thread is calling a C function. The problem is not profiler specific. I can reproduce it by calling into a C function that just sleeps. I guess it's related to how gevent creates green threads. But I'm not a gevent expert to say if there is a way to work around this. I may be able to look more into gevent tomorrow. But there are unlikely anything that can be fixed on the profiler side. It has to call into a long running C function.

hiimdoublej-swag commented 4 years ago

Another gentle ping :)

seizethedave commented 4 years ago

Seeing the same in eventlet. I suspect the problem is cloud profiler using greened modules (like threading), thus doing its thread work in a greenthread and locking up the event loop.

jqll commented 4 years ago

Hi seizethedave@, cloud profiler does create a new thread using threading.Thread. But I think that creates an OS thread? Could you elaborate a bit if you think the threading module is the problem?

I created https://github.com/jqll/celery-c-function to reproduce that problem that celery -P gevent can't dispatch tasks during another thread calling into a long running C++ extension function, even if that function releases GIL. I posted a question on https://groups.google.com/g/celery-users. Though I don't see it shows up in the discussion group immediately. It may need sometime to be public? I'll post here when it's visible.

jqll commented 4 years ago

Here is the question I posted on celery-users: https://groups.google.com/g/celery-users/c/_QY3cVd4tp0.

hiimdoublej-swag commented 3 years ago

So I guess this won't be supported right ?

sillygod commented 3 years ago

@hiimdoublej-swag have a look at issue

import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()

I think this may solve your problem if you can perform above snippet before spawning celery worker

nolanmar511 commented 2 years ago

Closing old issue.