aiidateam / aiida-quantumespresso

The official AiiDA plugin for Quantum ESPRESSO
https://aiida-quantumespresso.readthedocs.io
Other
55 stars 82 forks source link

Internet trafflic consumption when running local calculations #1045

Closed epatyukova closed 3 weeks ago

epatyukova commented 3 weeks ago

Hello! I have observed that I can run this tutorial https://aiida-quantumespresso.readthedocs.io/en/latest/tutorials/first_pw.html#tutorials-pw-through-cli for running pw.x through API only if I have internet connection, though I'm running calculations locally. If I do not have an internet connection pw.x is not started (the output file aiida.out is generated, but is empty). If I have an internet connection, all is working, but the amount of internet traffic consumed is considerable, though I run calculations locally.

Can you please explain what is going on? Thank you!

sphuber commented 3 weeks ago

Could you please be a bit more specific how you are monitoring internet traffic and how you conclude it is due to AiiDA? The example you show should not have any network connections as far as I know. What is the error or problem if you try to run the example without internet connection? How is the Computer defined that you use in the calculation? Is it really the localhost machine?

epatyukova commented 3 weeks ago

(1) I could see the amount of traffic on my mobile network operator account (I tried to run calculations on holiday where no other internet connection was available). I'm not sure about the precise amount. I'm not a specialist in networks, so can't give a better description, sorry. (2) If I disconnected the internet, calculations failed, (aiida.out was empty as pw.x was not started. At the same time all aiida files were created, so aiida machinery was working, but calculations were not done.) (3) I defined computer and the code as described on AiiDA website. With localhost, core.local transport, core.direct scheduler, as described here https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/run_codes.html#how-to-run-codes.

sphuber commented 3 weeks ago

Where the _scheduler-stderr.txt and _scheduler-stdout.txt files present in the working directory? What was their content? And what is the output of verdi process report <PK> for the failed calculation?

epatyukova commented 3 weeks ago

_scheduler-stderr.txt:

[M7LXCFCPJH:27992] ptl_tool: problems getting address for index 0 (kernel index -1)

The PMIx server's listener thread failed to start. We cannot continue.

_scheduler-stdout.txt is empty

verdi process report pk: 102596: None (empty scheduler output file) *** Scheduler errors: [M7LXCFCPJH:27992] ptl_tool: problems getting address for index 0 (kernel index -1)

The PMIx server's listener thread failed to start. We cannot continue.

*** 4 LOG MESSAGES: +-> WARNING at 2024-10-29 18:01:51.331929+00:00 | key 'symmetries' is not present in raw output dictionary +-> ERROR at 2024-10-29 18:01:51.347520+00:00 | ERROR_OUTPUT_STDOUT_INCOMPLETE +-> ERROR at 2024-10-29 18:01:51.349059+00:00 | Both the stdout and XML output files could not be read or parsed. +-> WARNING at 2024-10-29 18:01:51.349968+00:00 | output parser returned exit code<305>: Both the stdout and XML output files could not be read or parsed.

sphuber commented 3 weeks ago

Thanks for the additional details

The problem for the calculation not running is shown in the _scheduler-stderr.txt. Apparently, on your machine the script is trying to launch a PMIx server listener. I am not familiar with this tool, but how did you configure the local computer? What kind of MPI are you expecting to be used? Can you share the output of verdi computer show localhost and verdi computer configure show localhost?

Whatever you have configured, it seems it may be this that is actually trying to connect to the outside world. This is not something built into AiiDA though.

epatyukova commented 3 weeks ago

Thank you for the comment. I did not do anything with PMIx server listener (maybe there are some system configurations created by IT which are one of the reasons, I do not know), I just followed AiiDA manual. It is strange though that I can't do any calculations locally offline.

verdi computer show qe-computer: Label qe-computer PK 1 UUID afaf9e95-2647-417f-8bd4-cfcb72bb143f Description Hostname localhost Transport type core.local Scheduler type core.direct Work directory /Users/elena.patyukova/Documents/github/aiida-work Shebang #!/usr/bin/env python3 Mpirun command mpirun -np 4 Default #procs/machine 4 Default memory (kB)/machine Prepend text Append text

verdi computer configure show qe-computer:

epatyukova commented 3 weeks ago

On internet they write that it is RabbitMQ who uses PMIx server listener. So, it is probably the issue with RabbitMQ.

sphuber commented 3 weeks ago

On internet they write that it is RabbitMQ who uses PMIx server listener. So, it is probably the issue with RabbitMQ.

RabbitMQ is not being managed from inside a job, so it wouldn't show up in these output files, I am pretty sure. Could you share the content of the _aiidasubmit.sh script? And what kind of machine are you running AiiDA? Is it your personal laptop, or a workstation, or on some remote compute cluster? How did you compile/install QE itself?

epatyukova commented 3 weeks ago

Thank you. So, the _aiidasubmit.sh is

!/usr/bin/env zsh

exec > _scheduler-stdout.txt exec 2> _scheduler-stderr.txt

"mpirun" "-np" "4" '/Users/elena.patyukova/Documents/github/q-e/bin/pw.x' '-in' 'aiida.in' > "aiida.out"

I run it on my laptop. I installed QE from source, following installation instructions in their repository. I just want to add that everything is working if internet connection is on. I do not change anything apart from turning on the internet connection. So, it is wierd.

sphuber commented 3 weeks ago

It is indeed weird, don't really understand where it could be coming from. Did you compile QE with MPI support? Maybe just try running it without MPI. The aiida-quantumespresso calculation launch pw should have an option like --without-mpi (check the help for the exact form of the option) to disable MPI.

epatyukova commented 3 weeks ago

Yes, you are right, without MPI it works. So the reason is in MPI.

However, the solutions suggested here https://stackoverflow.com/questions/78348267/the-pmix-servers-listener-thread-failed-to-start-we-cannot-continue is not working and here https://stackoverflow.com/questions/16077460/does-rmpi-require-an-active-internet-connection/16121528#16121528 are not working.. (I tried to modify the command in a computer setup from mpirun -n 4 to mpirun --mca btl_tcp_if_include lo0 -n 4 and also to set env variable in my python script, both did not work).

sphuber commented 3 weeks ago

Glad we pinned it down to whatever version of MPI is installed on your system. There is not much more we can do for you I am afraid, as this is not an AiiDA problem. You would have the same problem if you run QE with MPI directly without AiiDA. So I will close this issue for now.