aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
434 stars 188 forks source link

Submitted calculations fail when using `ssh` to connect to the computer where aiida is installed #5889

Closed ireaml closed 1 year ago

ireaml commented 1 year ago

Describe the bug

When I ssh into my work computer (where aiida is installed) from another computer (at home), the calculations submitted to a remote HPC fail, but local calculations work fine. The daemon works fine, and the problem seems to be in connecting to the HPC (input files are not written in the remote HPC). To test if there were any issues in connecting to the HPC when using the ssh connection, I ran verdi computer test as well as verdi code test and all tests were sucessful. I also tested with a different HPC, and have the same problem. Of course, the calculations run fine when submitting them directly from my work computer (without sshing from home).

I attach below the output of running verdi process report pk for one of the failed calculations.

Steps to reproduce

Steps to reproduce the behavior:

  1. From another computer, ssh into the computer where aiida is installed
  2. Submit a calculation that is run in a remote HPC
  3. See error (input files never written to HPC, and calculation is excepted)

Expected behavior

Succesful submission of calculations to an HPC, even when using ssh to access the computer where aiida is installed. It'd be really useful, so that calculations can be submitted when working from a different computer to the one where aiida is installed.

Your environment

Thanks in advance!

Output of running verdi process report pk: output_report.md

sphuber commented 1 year ago

Could it be possible that the calculations start failing as soon as you close the shell with which you logged into your work computer from home? I remember having this in the past and the solution had nothing to do with AiiDA but just with how SSH works. I think what is happening is that when you are logging into your work computer and then open any SSH connection from the work computer to any other computer (the HPC clusters in your example), the session will be tied to the shell that you logged in to your work computer with. As soon as you close that, all other SSH sessions are invalidated as well. I am not a 100% sure of the exact mechanism, but that phenomenology was pretty consistent.

The solution for me was to use screen which is a unix utility to manage shell and keep them alive while allowing you to log out. I would use this on my work computer to open a session. Then from my home computer, I would login to the work computer and reload the active screen session. There I would then start the AiiDA daemon and launch the calculation. The screen session needs to remain alive for it to work, but you can "decouple" and continue to let it run in the background.

ireaml commented 1 year ago

Thanks for the quick reply!

I was submitting the calculations from a Juppyter notebook (using Visual Studio Code Remote Development), and the remote connection wasn't interrupted (I was monitoring the calculation to try identify the origin of the problem). But I'll try using screen when I get home anyways, thanks!!

ireaml commented 1 year ago

It's working now using screen as you suggested! Thanks a lot!! :)