OSC / ondemand

Supercomputing. Seamlessly. Open, Interactive HPC Via the Web
https://openondemand.org/
MIT License
263 stars 99 forks source link

use ps to count sessions instead of lsof #3511

Closed johrstrom closed 2 months ago

johrstrom commented 2 months ago

This uses ps to find the active sessions instead of lsof.

It should/could fix #3111.

CSC-swesters commented 2 months ago

Thanks, this looks promising! I'll have time to test this later in the week and I can provide more feedback after that.

I'd like to see the lsof code removed at the same time. I quickly grepped around the nginx_stage directory's code, and to me it looks like it might still be used here:

https://github.com/OSC/ondemand/blob/nginx-session-count/nginx_stage/lib/nginx_stage/generators/nginx_show_generator.rb#L30-L35

I'm not well versed in Rails, so I don't see how that is used. But would it be possible to fix this in the same manner, i.e., using ps instead of lsof?

johrstrom commented 2 months ago

I'm not well versed in Rails, so I don't see how that is used. But would it be possible to fix this in the same manner, i.e., using ps instead of lsof?

Sure thing! Updated in this PR.

treydock commented 2 months ago

I'd recommend for the ps command to maybe do timeout 10 ps ... or setup a timeout some other way. There are cases where procfs can hang and it'd be good if that didn't also hang up nginx_stage so that when procfs does hang there aren't just a proliferation of ps commands hanging on the system forever.

CSC-swesters commented 2 months ago

I'd recommend for the ps command to maybe do timeout 10 ps ... or setup a timeout some other way. There are cases where procfs can hang and it'd be good if that didn't also hang up nginx_stage so that when procfs does hang there aren't just a proliferation of ps commands hanging on the system forever.

Yes, that's an excellent suggestion. +1 for that. If I'm not mistaken, these code paths are not part of the creation of new PUNs, right? So the only times it might hang should be when the nginx_stage nginx_clean command is run? (And the check_socket_for_active_sessions method, which I'll ask about below)

Sure thing! Updated in this PR.

Thanks! Would it still be possible @johrstrom to actually remove the old lsof method's code? It's supposed to be unused now, so in my opinion it's better to remove this (hopefully) dead code now, instead of leaving it in to cause confusion.

I still didn't understand whether the check_socket_for_active_sessions hook is used anywhere. Could you explain how this code is used? Or is it dead as well?

$ grep -r "check_socket_for_active_sessions"
nginx_stage/lib/nginx_stage/generators/nginx_show_generator.rb:    add_hook :check_socket_for_active_sessions do

If nothing uses it, maybe it would be better to remove this also altogether? :slightly_smiling_face:

johrstrom commented 2 months ago

I still didn't understand whether the check_socket_for_active_sessions hook is used anywhere. Could you explain how this code is used? Or is it dead as well?

All hooks get run when you use a generator. So nginx_stage nginx_show --user jeff would trigger this hook.

So running that, I get something like this which is where you see those puts statements being issued in all the hooks from that generator.

[root@1e62ecd9be6d ~]# /opt/ood/nginx_stage/sbin/nginx_stage nginx_show --user jeff
User: jeff
Instance: 429
Socket: /var/run/ondemand-nginx/jeff/passenger.sock
Sessions: 1