Backblaze / b2-sdk-python

Python library to access B2 cloud storage.
Other
184 stars 61 forks source link

Too many open files - b2sdk.v2 (CLOSE_WAIT) #405

Closed MuriloBianco closed 1 year ago

MuriloBianco commented 1 year ago

I have a python script daemon running through system service. This service backs up some stuff and sends it to the backblaze using b2sdk.v2. But after a few days if the service is turned on, I start getting the "Too many open files" error.

While investigating the issue, I was able to see many open connections in the lsof -p command.

image

Can anyone helpme?

ppolewicz commented 1 year ago

Backblaze has many servers and the client maintains connections to not cause thrashing. Then they are closed later, but client can see them in this CLOSE_WAIT state, it's not really harmful. You can increase your open files limit using ulimit.

Perhaps we can try to collect those sockets more aggresively, I'm going to look into that.

ppolewicz commented 1 year ago

ok, I looked into it and it might be that you are holding on to the response objects and you are not getting the garbage collector sweep them up (which would finish closing the socket). Can you please check if that's the case? If it's not, can you provide a small code sample that demonstrates the leak? It might be something on the sdk side as well and if it is we'll try to fix it.

MuriloBianco commented 1 year ago

Is there a method in the sdk to scan these connections and close them? If so, I'm not really using it.

Anyway, follow the important snippets of my code.

Remembering that this runs in a daemon, so the tendency is that the process never ends.

I've already run other tests where the execution is unique, and in the end, all connections are closed normally.

The problem really only happens, because my process never ends.

image

image

image

image

MuriloBianco commented 1 year ago

Imagine the following, I have a table that keeps the information of each client (bucket name, key id and app keys) This daemon is "listening" to this table every minute to find out if there is a backup scheduled for that time. So when there is a list of backups to be made, my code gathers everything that each client needs and sends it file by file. At the end of each run, we can see that the connections are not closed.

I even increased the ulimit of the machine, but that only postpones the problem. Before the too many open files error happened in 5 days, now it takes longer to happen, but it still happens.

mjurbanski-reef commented 1 year ago

Please provide pip freeze - out of which most important are b2sdk and requests versions.

Ideally, please provide a minimal example to reproduce this problem. I'm continuously running python processes using b2sdk and have not seen zombie connections (neither ss, lsof seem to show them). There are few (<5) connections in *_WAIT state tops. In my scenario multiple files (~10) are uploaded every other minute. In yours, do you have "bursts" of uploads, i.e. how many files do you upload at a time?

ppolewicz commented 1 year ago

Perhaps the request.response object is not being read, so urllib keeps not the connection, but the response object, which in turn occupies the connection?

I kind of remember a bug like that being fixed a while ago. @MuriloBianco are you using the newest version of the sdk?

MuriloBianco commented 1 year ago

Guys, first of all, thank you all for your support.

Some time ago we updated all the libs in the environment. I followed all the services for a whole month and the problem stopped happening.

Thank you everyone and sorry for not responding sooner.