Closed pancetta closed 3 years ago
Thank you for trying out the package. Can you post the entire output of pytest here? This would help me a lot to figure out what might be the issue.
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/robert/Documents/codes/performance
plugins: benchmark-3.2.3, MPI-0.0.1.dev2
collected 2 items
test_mpi.py .F [100%]
======================================================================================================================= FAILURES =======================================================================================================================
____________________________________________________________________________________________________________________ test_parallel _____________________________________________________________________________________________________________________
args = (), kwargs = {}, executable = 'mpirun', test_name = 'test_mpi.py::test_parallel'
env = {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, errors = [(0, '')], i = 0
file_name = 'test_mpi.py..test_parallel_0', f = <_io.TextIOWrapper name='test_mpi.py..test_parallel_0' mode='r' encoding='UTF-8'>, rank = 0
@functools.wraps(func)
def replacement_func(*args, **kwargs):
if not in_mpi_session():
executable = mpi_executable(mpi_executable_name)
test_name = get_pytest_input(func)
env = dict(os.environ)
env["PYTHONPATH"] = os.pathsep.join(sys.path)
try:
> mpi_subprocess_output = subprocess.check_output(
[
executable,
"-np",
str(nprocs),
sys.executable,
"-m",
"pytest_MPI._print_capture",
test_name,
],
env=env,
stderr=subprocess.STDOUT,
)
../../../miniconda3/envs/performance/lib/python3.8/site-packages/pytest_MPI/_decorator.py:93:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
timeout = None, popenargs = (['mpirun', '-np', '1', '/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest_MPI._print_capture', ...],)
kwargs = {'env': {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, 'stderr': -2}
def check_output(*popenargs, timeout=None, **kwargs):
r"""Run command with arguments and return its output.
If the exit code was non-zero it raises a CalledProcessError. The
CalledProcessError object will have the return code in the returncode
attribute and output in the output attribute.
The arguments are the same as for the Popen constructor. Example:
>>> check_output(["ls", "-l", "/dev/null"])
b'crw-rw-rw- 1 root root 1, 3 Oct 18 2007 /dev/null\n'
The stdout argument is not allowed as it is used internally.
To capture standard error in the result, use stderr=STDOUT.
>>> check_output(["/bin/sh", "-c",
... "ls -l non_existent_file ; exit 0"],
... stderr=STDOUT)
b'ls: non_existent_file: No such file or directory\n'
There is an additional optional argument, "input", allowing you to
pass a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
it too will be used internally. Example:
>>> check_output(["sed", "-e", "s/foo/bar/"],
... input=b"when in the course of fooman events\n")
b'when in the course of barman events\n'
By default, all communication is in bytes, and therefore any "input"
should be bytes, and the return value will be bytes. If in text mode,
any "input" should be a string, and the return value will be a string
decoded according to locale encoding, or by "encoding" if set. Text mode
is triggered by setting any of text, encoding, errors or universal_newlines.
"""
if 'stdout' in kwargs:
raise ValueError('stdout argument not allowed, it will be overridden.')
if 'input' in kwargs and kwargs['input'] is None:
# Explicitly passing input=None was previously equivalent to passing an
# empty string. That is maintained here for backwards compatibility.
kwargs['input'] = '' if kwargs.get('universal_newlines', False) else b''
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
**kwargs).stdout
../../../miniconda3/envs/performance/lib/python3.8/subprocess.py:411:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
input = None, capture_output = False, timeout = None, check = True, popenargs = (['mpirun', '-np', '1', '/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest_MPI._print_capture', ...],)
kwargs = {'env': {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, 'stderr': -2, 'stdout': -1}
process = <subprocess.Popen object at 0x7fa787e7a6a0>, stdout = b'', stderr = None, retcode = 15
def run(*popenargs,
input=None, capture_output=False, timeout=None, check=False, **kwargs):
"""Run command with arguments and return a CompletedProcess instance.
The returned instance will have attributes args, returncode, stdout and
stderr. By default, stdout and stderr are not captured, and those attributes
will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
If check is True and the exit code was non-zero, it raises a
CalledProcessError. The CalledProcessError object will have the return code
in the returncode attribute, and output & stderr attributes if those streams
were captured.
If timeout is given, and the process takes too long, a TimeoutExpired
exception will be raised.
There is an optional argument "input", allowing you to
pass bytes or a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
it will be used internally.
By default, all communication is in bytes, and therefore any "input" should
be bytes, and the stdout and stderr will be bytes. If in text mode, any
"input" should be a string, and stdout and stderr will be strings decoded
according to locale encoding, or by "encoding" if set. Text mode is
triggered by setting any of text, encoding, errors or universal_newlines.
The other arguments are the same as for the Popen constructor.
"""
if input is not None:
if kwargs.get('stdin') is not None:
raise ValueError('stdin and input arguments may not both be used.')
kwargs['stdin'] = PIPE
if capture_output:
if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
raise ValueError('stdout and stderr arguments may not be used '
'with capture_output.')
kwargs['stdout'] = PIPE
kwargs['stderr'] = PIPE
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
retcode = process.poll()
if check and retcode:
> raise CalledProcessError(retcode, process.args,
output=stdout, stderr=stderr)
E subprocess.CalledProcessError: Command '['mpirun', '-np', '1', '/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest_MPI._print_capture', 'test_mpi.py::test_parallel']' returned non-zero exit status 15.
../../../miniconda3/envs/performance/lib/python3.8/subprocess.py:512: CalledProcessError
During handling of the above exception, another exception occurred:
args = (), kwargs = {}, executable = 'mpirun', test_name = 'test_mpi.py::test_parallel'
env = {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, errors = [(0, '')], i = 0
file_name = 'test_mpi.py..test_parallel_0', f = <_io.TextIOWrapper name='test_mpi.py..test_parallel_0' mode='r' encoding='UTF-8'>, rank = 0
@functools.wraps(func)
def replacement_func(*args, **kwargs):
if not in_mpi_session():
executable = mpi_executable(mpi_executable_name)
test_name = get_pytest_input(func)
env = dict(os.environ)
env["PYTHONPATH"] = os.pathsep.join(sys.path)
try:
mpi_subprocess_output = subprocess.check_output(
[
executable,
"-np",
str(nprocs),
sys.executable,
"-m",
"pytest_MPI._print_capture",
test_name,
],
env=env,
stderr=subprocess.STDOUT,
)
except subprocess.CalledProcessError as error:
print(error.output.decode('utf-8'))
errors = []
for i in range(nprocs):
file_name = f'{get_filename(test_name)}_{i}'
if os.path.isfile(file_name):
with open(file_name) as f:
errors.append((i, f.read()))
os.remove(file_name)
for rank, message in errors:
print(f'Rank {rank} reported an error:\n{message}')
> assert False
E assert False
../../../miniconda3/envs/performance/lib/python3.8/site-packages/pytest_MPI/_decorator.py:120: AssertionError
----------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------
Rank 0 reported an error:
=============================================================================================================== short test summary info ================================================================================================================
FAILED test_mpi.py::test_parallel - assert False
============================================================================================================= 1 failed, 1 passed in 0.86s ==============================================================================================================
OK, I hope this is readable to some extend. It looks like the subprocess ends with error code 15 and I have no idea why. Here is a txt file with the output, maybe this is better to digest: https://drive.google.com/file/d/1ZV3Hs2xqPmHlTuv9J4430-Zx-1683_CP/view?usp=sharing
Indeed it seems like it cannot find mpirun
. But you said, when you run the script via the command line it works. So how are you calling the test now?
So, when I run
mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel
it goes through without any complaints. I don't see any output, though (when adding a print statement).
And if you call
pytest
in the folder where you have the test_mpi.py
you get the above error message?
Exactly. I have the feeling that I'm doing something stupid here, but I don't see what that could be. Two more pieces of information: MPI is installed using miniconda, too, so maybe this is a path/environment issue. And: I also have pytest-benchmark installed (don't know if this is a problem).
Can you set your test to assert False
and run:
mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel
echo $?
Does it create a file? Does it print 0?
I get an empty file and echo prints out 15.. interesting..
mpirun
works when running a simple test program.
Can you run
python -m pytest_MPI._print_capture test_mpi.py::test_parallel
?
Goes through, no output, error code 1 (since assume False
)
And the content of the file is still empty?
Oh, no, sorry:
@mpi_parallel(1)
def test_parallel():
> assert False
E assert False
test_mpi.py:9: AssertionError
=========================== short test summary info ============================
FAILED test_mpi.py::test_parallel - assert False%
Can you try running a simple MPI program:
your file
from mpi4py import MPI
print(MPI.COMM_WORLD.Get_rank())
and in the terminal:
mpirun -np 4 python your_file.py
Yes, this works.
Just to make sure as you didn't include it in your initial post, you have the following import in your test file?
from pytest_MPI import mpi_parallel
Yes. This would have been such a stupid error.. but, no, it's there.
What about:
new_script.py
import subprocess
import sys
output = subprocess.check_output(["mpiexec", "-np", "4", sys.executable, "your_file.py"], universal_newlines=True)
print(output)
python new_script.py
Got the expected output of 0-3 in some random order..
How did you install the package? Did you use pip and a virtual environment?
Yes. I was/am in my conda environment, went to the download folder and ran
pip install .
Went through without complaints, me happy. You can see the location of the addon in the output above.
And what about this code (sry that I cannot be more specific, I am basically also doing a trial and error at this point :/ ):
yet_another_try.py
import sys
import subprocess
subprocess.check_output(
[sys.executable, "-m", "pytest", "--color=yes", "--in_mpi_session", "test_mpi.py::test_parallel"],
stderr=subprocess.STDOUT,
universal_newlines=True,
)
mpirun -np 1 python yet_another_try.py
Well, here it comes:
Traceback (most recent call last):
File "new_script.py", line 4, in <module>
subprocess.check_output(
File "/Users/robert/miniconda3/envs/performance/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Users/robert/miniconda3/envs/performance/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest', '-- color=yes', '--in_mpi_session', 'test_mpi.py::test_parallel']' returned non-zero exit status 15.
And if you run:
python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
platform darwin -- Python 3.8.5, pytest-6.1.0, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/robert/Documents/codes/performance
plugins: benchmark-3.2.3, MPI-0.0.1.dev2
collected 1 item
test_mpi.py F [100%]
======================================================================================================================= FAILURES =======================================================================================================================
____________________________________________________________________________________________________________________ test_parallel _____________________________________________________________________________________________________________________
@mpi_parallel(1)
def test_parallel():
> assert False
E assert False
test_mpi.py:9: AssertionError
=============================================================================================================== short test summary info ================================================================================================================
FAILED test_mpi.py::test_parallel - assert False
================================================================================================================== 1 failed in 0.05s ===================================================================================================================
(performance)
And:
mpirun -np 1 python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.1.0, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/robert/Documents/codes/performance
plugins: benchmark-3.2.3, MPI-0.0.1.dev2
collected 1 item
test_mpi.py F [100%]
=================================== FAILURES ===================================
________________________________ test_parallel _________________________________
@mpi_parallel(1)
def test_parallel():
> assert False
E assert False
test_mpi.py:9: AssertionError
=========================== short test summary info ============================
FAILED test_mpi.py::test_parallel - assert False
============================== 1 failed in 0.05s ===============================
can you remove the universal_newlines
statement from the test file above and retry?
mpirun -np 1 python new_script.py
gives me the same error as before, the others do not depend on new_script.py
. Is that intended?
Sorry, my new_script.py
is your yet_another_try.py
.
What about:
mpirun -np 1 /Users/robert/miniconda3/envs/performance/bin/python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
and
/Users/robert/miniconda3/envs/performance/bin/python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
The same expected output from pytest
.
Same as in: it works, or same as in it gives the same error? Hm, this is really strange.
Same as in: it works, i.e. it throws the assertion error from pytest
.
Yeah, strange it is. I thank you for your effort! Did you ever test this with conda?
Ok, so in summary, what we have so far:
But we are literally copying the command line command into the subprocess function...
I haven't tested it with conda, but did test it on Windows and Linux. I cannot imagine subprocess being so different though between the installations.
Can you replace the check_output
with run
?
Is this supposed to work out of the box, i.e. without any modifications to the parameters? I get this:
$ mpirun -np 1 python new_script.py 1s
[cli_0]: write_line error; fd=4 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=4 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586):
MPID_Init(175).......: channel initialization failed
MPID_Init(463).......: PMI_Get_appnum returned -1
[cli_0]: write_line error; fd=4 buf=:cmd=abort exitcode=1093647
:
system msg for write_line failure : Bad file descriptor
So, if this is supposed to work, I think this error could come from using the wrong mpirun
. There is yet another MPI installation outside of the conda environment and IF, for some reason, subprocess.run
does not see the conda version of MPI, it will produce strange errors.
Hm, interesting. I have the feeling we are getting closer. Does Anaconda offer different packages for MPI? If so, can you try a different one? Like MPICH vs OpenMPI?
OK, before I do that, I replaced
subprocess.run([sys.executable, "-m", "pytest", "--color=yes", "--in_mpi_session", "test_mpi.py::test_parallel"])
with
subprocess.run(["which", "mpirun"])
and got the correct one from the conda environment. So it is not an issue of using the wrong MPI installation for executing via subprocess.run
, at least not directly. Is it within pytest-MPI?
So, here is what I got:
subprocess.run
works when using OpenMPI instead of MPICH.subprocess.check_output
works in the sense that I get error code 1 when I have assert False
and no error otherwisepytest test_mpi.py
.... simply hangs/freezes. It does the test_serial
from your original example, but at test_parallel
, nothing happens.And:
mpirun -np 1 python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
works in the sense that it runs through, if assert True
assert False
! I get the correct output from pytest
, but then it waits for CTRL+C. So, this is why pytest test_mpi.py
freezes.Bottom line: it seems to do the right thing when using OpenMPI, but then I cannot use the fixture either because the process does not terminate. And we still don't know why it does not work with MPICH.
Can you try:
mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel
with OpenMPI?
Weird!
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
getting local rank failed
--> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "No permission" (-17) instead of "Success" (0)
--------------------------------------------------------------------------
So I configured a Travis runner and was first getting the same error message, but after an environment variable that was suggested online, it now also hangs indefinitely. Could you try the MPICH version again and use https://stackoverflow.com/a/54811969/2305545 this hint to try to get it running?
Yes, I saw that comment, too, but my hosts file is already correct. Sorry..
OK, so using export PMIX_MCA_gds=hash
fixes the problem with OpenMPI, i.e. the command
mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel
now also hangs indefinitely for me when using OpenMPI. It goes through with MPICH.
So you do not get the exit status 15 with mpich? Could you check: https://github.com/NOhs/pytest-MPI/blob/travis-integration/.travis.yml for any differences to your setup? Because on the Travis OSX VMs, I still get the exit status 15.
Yeah, sorry, I did not look at the error code. When doing echo $?
after the run, the error code is 15.
Ok. Darn. I think I will create some more examples and run them on travis to find out an easy to understand case of the error. Then I probably have to contact either the mpi4py or the open-mpi teams and ask for help there. It might take some time, sorry :/
No worries. I learned a lot of useful things here and when looking at how you implemented things. I understand that you now have a setup which lets you reproduce the error on your machine?
On an OSX virtual machine, yes.
Great approach, exactly what I would need. However, after installing it I cannot get it to work. Even with a
I get subprocess.CalledProcessError, saying that the command returned the error code 15. When I run the command by hand on the command line, all works well. Could be an environment thing? I've no idea how to debug or even fix that.
I'm running on macOS with Python 3.8 from miniconda. Any help is appreciated!