aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
413 stars 185 forks source link

Make the ssh channel timeout configurable? #6377

Closed unkcpz closed 3 weeks ago

unkcpz commented 2 months ago

https://github.com/aiidateam/aiida-core/blob/212f6163b03b8762509ae2230c30172af8c02fed/src/aiida/transports/plugins/ssh.py#L1413-L1414

When using aiida-hyperqueue, the command verdi data hyperqueue server start eiger-mem-hq will call nohup hq server start 1>$HOME/.hq-stdout 2>$HOME/.hq-stderr & to start hq server from remote. But I run into the timeout error in the line above. The problem fixed when I increase the timeout from 0.01 to 0.1 or just time.sleep(0.1) before stdout.read(). The timeout also happens if the command is nohup ls &.

I guess maybe it is not proper to use exec_command_wait_bytes for nohup? Or either make the channel SSH timeout configurable?

Pinning @giovannipizzi I guess you are the author of this part? @mbercx since it is hq related.

giovannipizzi commented 2 months ago

Ok, I didn't know this would happen. Is the solution robust? I think the best is to make it somehow configurable, as I'm not sure what are the performance implications to change it globally

unkcpz commented 2 months ago

If I understand correctly, the timeout happens because nohup return the control too fast that timeout 0.01 set for channel is not enough to make the stdout read to call read() function. Is that possible to add a while loop to wait for this ready and put a outside timeout for ending the while loop in case it is not exit?