chiefwigms / picobrew_pico

MIT License
149 stars 63 forks source link

OSError: [Errno 24] Too many open files #274

Closed Intecpsp closed 3 years ago

Intecpsp commented 3 years ago

@tmack8001 is familiar with this.

I started having this issue after the TILT/NGINX/PicoStill changes recently. I ended up reverting to 41b2c83 and climbing each commit until I found the one that caused the issue. After slowly climbing the tree and giving it a few hours to hit the limit, I found that 4c7e112 is the one that causes this issue for me.

EDIT: I will add that when this issue arises, the entire picobrew server goes down. Also, when I was trying to just get it stable, I ended up raising my limit from 1024 to 8192 and still had this issue. That is when I reverted and started trying to find the bad commit. For trying to find the bad commit, I did revert back to 1024.

tmack8001 commented 3 years ago

https://github.com/chiefwigms/picobrew_pico/commit/4c7e11244836ea0e9f7373dc2a758abc525bda3b can't be the commit that causes issues as that nginx change isn't even loaded by your server yet (as I hadn't made the modification to the git post script hook we added in a later commit). This is just staging for that so isn't active at all.

Must be something else.

tmack8001 commented 3 years ago

To get more information here I'd like to get the number of open file descriptors at various times on your raspberrypi as related and coresponding with the python3 server.

$ ps aux | grep "python3 
server.py"

root      7255  0.0  5.0 101692 47460 ?        Sl   Mar16   4:01 python3 server.py 0.0.0.0 80
pi       14448  0.0  0.0   7480   552 pts/0    S+   11:58   0:00 grep --color=auto python3 server.py

(note in the above "$" denotes where the command is vs the example output that follows the command)

After this command locate the "pid" (that first number after the username running the process). To locate the python3 server look for a line similar (not identical to) this one root 7255 0.0 5.0 101692 47460 ? Sl Mar16 4:01 python3 server.py 0.0.0.0 80 (the first in the above example output)

sudo ls -l /proc/<pid>/fd | wc -l

The above command will take that pid that was located in the first step and list out while doing a line count (each open file descriptor is on a line followed by a newline character). I'd be interested in how this count grows over time as this mostly.

Also would be interesting to get the list of actual file descriptors. For instance in my server I have a few open file descriptors to active sessions:

(below contains a subset of open file descriptors to the active devices that are logging to the server)

sudo ls -l /proc/7255/fd | grep session
l-wx------ 1 root root 64 Mar 21 04:10 3 -> /picobrew_pico/app/sessions/ferm/active/20210127_211301#2c3ae834b45b.json
l-wx------ 1 root root 64 Mar 21 04:10 40 -> /picobrew_pico/app/sessions/tilt/active/20210316_160024#Blue-C7F9F4A8651E.json
Intecpsp commented 3 years ago

Just blew up, most of the descriptors are "pipe:[XXXXXX]" , but some are "socket:[XXXXXX]" (X = number). Lived ~ 3 hours on commit 4c7e112, I'm going to back down to bc15114 for now. I'll check on it after ~ 3hours to see where its number is.

Intecpsp commented 3 years ago

Very odd, I went back to bc15114 and just saw it at 1021 and everything was good.. Until I went to about, then it blew up. Leaving about brings it back down to 1021 and allows the main page to show. But I bet it wouldn't hold a session from a device right now

Intecpsp commented 3 years ago

Approaching an hour on 283f964 and have a solid 6 with no fluctuations.. It has to be a2cfa03 or bc15114

tmack8001 commented 3 years ago

What I think is definitely happening here is that the Tilt scanning thread is spawning a new thread each and every time it async does work. What could make this better is just to forget the async nature and just process everything on a dedicated thread in a while session.active: loop instead of passing off to asyncio. Which would at least be similar to how the still polling is working.

I'll play around with it @cgalpin to see if this would work better.

tmack8001 commented 3 years ago

@Intecpsp though if that is the case it should start right at https://github.com/chiefwigms/picobrew_pico/commit/bc151140ef0d8dbe59644cf57a552920f7427a5c or even the commit before if one manually sets up bluetooth and the config.yaml changes to activate the tilt processing threads.

cgalpin commented 3 years ago

I'm sure it's the tilt thread. I had initially tried to get it t use the current thread but had problems. Using asyncio.run worked, and did not see this issue, sorry. I will take a look as well.

tmack8001 commented 3 years ago

No worries @cgalpin over having cold brew tonight I figured out a solution to this. Keeping the while True: essentially within the spawned async thread vs outside. Though made a few other changes in the process.

cgalpin commented 3 years ago

Thanks for fixing it. I like your solution!

Intecpsp commented 3 years ago

I can confirm that after ~1 hour, I'm only showing 15 while running 98bb272 and tilt_monitoring: true. Looks like it's working!