firebase / firebase-admin-python

Firebase Admin Python SDK
https://firebase.google.com/docs/admin/setup
Apache License 2.0
1.02k stars 315 forks source link

urllib3 connection pool full using messaging.send_each_for_multicast() #712

Open filippotp opened 1 year ago

filippotp commented 1 year ago

I'm using FCM to send multicast messages from my Django app running on Heroku.

Since messaging.send_muticast() function is deprecated, I changed my code by using messaging.send_each_for_muticast(), as stated in the deprecation warning.

After pushing the new code to production, I often see multiple warnings in the Heroku application logs when the app sends messages: WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: fcm.googleapis.com. Connection pool size: 10

Editing my code back to using messaging.send_muticast() seems to solve the issue.

google-oss-bot commented 1 year ago

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

Doris-Ge commented 1 year ago

Hi @filippotp,

Thanks for reporting the issue!

According to the answers in https://stackoverflow.com/questions/53765366/urllib3-connectionpool-connection-pool-is-full-discarding-connection, seeing WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: fcm.googleapis.com. Connection pool size: 10 does not mean that there is any problem with messaging.send_each_for_multicast(). As long as messaging.send_each_for_multicast() returns a BatchResponse without any failures in it, then the multicast_message should be sent to all the tokens successfully.

Here's the reason why those warnings only show up in the logs when messaging.send_each_for_multicast() is used:

Unlike messaging.send_multicast, the underlying implementation of messaging.send_each_for_multicast() uses concurrent.futures.ThreadPoolExecutor to start multiple threads to send to tokens via urllib3 in parallel. We set the maximum number of threads that ThreadPoolExecutor can start, that is max_workers, to the number of the tokens in the multicast_message. That means if a multicast_message contains 50 tokens, then ThreadPoolExecutor may start at most 50 threads. However, the maxsize for urllib3.connectionpool is not configured so it is the default value, 10. Then https://stackoverflow.com/a/66671026 helps explain everything:

For example, if your maxsize is 10 (the default when using urllib3 via requests), and you launch 50 requests in parallel, those 50 connections will be performed at once, and after completion only 10 will remain in the pool while 40 will be discarded (and issue that warning).

So if you only send multicast_message with no more than 10 tokens, I think the warning logs should be gone. However, the warning logs don't really matter. messaging.send_each_for_multicast() should still work properly regardless of the warnings. We may optimize the implementation of messaging.send_each_for_multicast() to start less threads in the future and may be able to get rid of those warnings, but before that happens, please continue migrating to messaging.send_each_for_multicast().

mortenthansen commented 3 months ago

Any updates on letting messaging.send_each and messaging.send_each_for_multicast start less threads? Its not obvious to me why starting a thread per token in the multicast message is a good idea. The number of tokens that one wants to send the message to most likely has nothing to do with the desire for concurrency (which depends on system resources, etc.)

Note that calling messaging.send_each_for_multicast multiple times (which is what I do now) is a sub-optimal solution.

Maybe the SDK can be extended to allow us to configure the max_workers to use when calling messaging.send_each and messaging.send_each_for_multicast?

Klemenceo commented 3 months ago

+1 on @mortenthansen comment, we got bit by this, sending push by batches of 500 messages, only to realise len(input) was used as the pool_size for the ThreadPool. I really can't figure out a good rationale for this. Spawning 500 threads (which I don't think will happen since python ThreadPool reuses free ones before spawning new ones), isn't cost free and at the very least should be documented as such.

milemik commented 2 months ago

Hi @filippotp,

Thanks for reporting the issue!

According to the answers in https://stackoverflow.com/questions/53765366/urllib3-connectionpool-connection-pool-is-full-discarding-connection, seeing WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: fcm.googleapis.com. Connection pool size: 10 does not mean that there is any problem with messaging.send_each_for_multicast(). As long as messaging.send_each_for_multicast() returns a BatchResponse without any failures in it, then the multicast_message should be sent to all the tokens successfully.

Here's the reason why those warnings only show up in the logs when messaging.send_each_for_multicast() is used:

Unlike messaging.send_multicast, the underlying implementation of messaging.send_each_for_multicast() uses concurrent.futures.ThreadPoolExecutor to start multiple threads to send to tokens via urllib3 in parallel. We set the maximum number of threads that ThreadPoolExecutor can start, that is max_workers, to the number of the tokens in the multicast_message. That means if a multicast_message contains 50 tokens, then ThreadPoolExecutor may start at most 50 threads. However, the maxsize for urllib3.connectionpool is not configured so it is the default value, 10. Then https://stackoverflow.com/a/66671026 helps explain everything:

For example, if your maxsize is 10 (the default when using urllib3 via requests), and you launch 50 requests in parallel, those 50 connections will be performed at once, and after completion only 10 will remain in the pool while 40 will be discarded (and issue that warning).

So if you only send multicast_message with no more than 10 tokens, I think the warning logs should be gone. However, the warning logs don't really matter. messaging.send_each_for_multicast() should still work properly regardless of the warnings. We may optimize the implementation of messaging.send_each_for_multicast() to start less threads in the future and may be able to get rid of those warnings, but before that happens, please continue migrating to messaging.send_each_for_multicast().

@filippotp does this means that send_each_for_multicast with 500 messages will send all push notifications, regardless the warnings (will not discard 490 messages out of 500 and send only 10 per batch)?

Thank you for your answer in advance :)

filippotp commented 2 months ago

@milemik I think you are referring to @Klemenceo's comment.

Anyway, in my personal case the issue has never occurred again after updating firebase-admin to 6.3.0 and definitively migrating to messaging.send_each_for_multicast().

milemik commented 2 months ago

@milemik I think you are referring to @Klemenceo's comment.

Anyway, in my personal case the issue has never occurred again after updating firebase-admin to 6.3.0 and definitively migrating to messaging.send_each_for_multicast().

Yes, sorry questtion was for @Doris-Ge :) I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project. Just want to understand if all of my push notifications will be sent regardless of this warning.

Thank you 😄

ankit-wadhwani-hb commented 1 month ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers.

What can be the solution for this? @milemik @filippotp

milemik commented 1 month ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers.

What can be the solution for this? @milemik @filippotp

Hi @ankit-wadhwani-hb to be honest I didn't benchmark this, but it does make sense that threads will use 100% of CPU, maybe this is expected. What matters is that CPU goes back to normal after sending push notifications,and hopefully no notifications will be lost. I will get back to you when I test it. Thank you for this notice!

Jay2109 commented 1 week ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers.

What can be the solution for this? @milemik @filippotp

This similar thing is happening with me , the cpu utilisation goes to 99-100 % and as soon as I restart the process the cpu drops significantly.

Jay2109 commented 1 week ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers. What can be the solution for this? @milemik @filippotp

This similar thing is happening with me , the cpu utilisation goes to 99-100 % and as soon as I restart the process the cpu drops significantly.

Quick update if I change the batch size from 500 to 10 it works properly but not sure if it is the right way to do it

milemik commented 1 week ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers. What can be the solution for this? @milemik @filippotp

This similar thing is happening with me , the cpu utilisation goes to 99-100 % and as soon as I restart the process the cpu drops significantly.

Quick update if I change the batch size from 500 to 10 it works properly but not sure if it is the right way to do it

@Jay2109 Well I don't think this is a good fix... This means that more more resources we need to use when sending push notifications (more tasks will be triggered at least in my logic 😄 ).

Jay2109 commented 1 week ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers. What can be the solution for this? @milemik @filippotp

This similar thing is happening with me , the cpu utilisation goes to 99-100 % and as soon as I restart the process the cpu drops significantly.

Quick update if I change the batch size from 500 to 10 it works properly but not sure if it is the right way to do it

@Jay2109 Well I don't think this is a good fix... This means that more more resources we need to use when sending push notifications (more tasks will be triggered at least in my logic 😄 ).

What solution have you implemented currently ? In my case I am sending many notifications, they are not reaching android devices on time.

ankit-wadhwani-hb commented 1 week ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers. What can be the solution for this? @milemik @filippotp

This similar thing is happening with me , the cpu utilisation goes to 99-100 % and as soon as I restart the process the cpu drops significantly.

Quick update if I change the batch size from 500 to 10 it works properly but not sure if it is the right way to do it

@Jay2109 Well I don't think this is a good fix... This means that more more resources we need to use when sending push notifications (more tasks will be triggered at least in my logic 😄 ).

What solution have you implemented currently ? In my case I am sending many notifications, they are not reaching android devices on time.

@Jay2109 @milemik - any other solution? currently I am doing batching of 10 messages at a time otherwise it consumes the full server cpu + memory and hampers other services.

Jay2109 commented 1 week ago

The batch of 10 also doesn't work for a longer duration after sometime the cpu increases. Then I restart the process again . This is happening for me

milemik commented 1 week ago

I have version 6.5.0 and I can see these warnings, in my logs of django (Celery) project, also because of spawning 500 threads the workers consumes 100% CPU machine which affects other workers. What can be the solution for this? @milemik @filippotp

This similar thing is happening with me , the cpu utilisation goes to 99-100 % and as soon as I restart the process the cpu drops significantly.

Quick update if I change the batch size from 500 to 10 it works properly but not sure if it is the right way to do it

@Jay2109 Well I don't think this is a good fix... This means that more more resources we need to use when sending push notifications (more tasks will be triggered at least in my logic 😄 ).

What solution have you implemented currently ? In my case I am sending many notifications, they are not reaching android devices on time.

@Jay2109 @milemik - any other solution? currently I am doing batching of 10 messages at a time otherwise it consumes the full server cpu + memory and hampers other services.

Well @ankit-wadhwani-hb to be honest not sure... I hope that developers of this library are aware of this issue and I expect to hear some answer from them. How many push notifications you sent? It really depends on where you are hosting and how you are creating tasks...

ankit-wadhwani-hb commented 1 week ago

@milemik - Currently there are 20 scheduled bulk notifications mostly that go daily to 100,000 users, I am using celery with python to push the notifications in SQS queue and running a supervisor worker to consume the messages from the queue to send the notifications.

milemik commented 1 week ago

@milemik - Currently there are 20 scheduled bulk notifications mostly that go daily to 100,000 users, I am using celery with python to push the notifications in SQS queue and running a supervisor worker to consume the messages from the queue to send the notifications.

Ok, and how much push notifications you send in one task? If you are sending push notification with one task, maybe you can do some optimisation and not sending 100.000 in one celery task, but split it into more smaller tasks. PS. its really hard to give you recommendation without some code snippet 😄

Anyway lets wait some answer from someone in development team to give us some answers 😃

Jay2109 commented 1 week ago

I am not able to send 500 in a batch after sometime it is eating up my cpu