NOhs / pytest-easyMPI

Integrating MPI tests into your regular pytest scripts.
MIT License
3 stars 1 forks source link

subprocess.CalledProcessError #2

Closed pancetta closed 3 years ago

pancetta commented 3 years ago

Great approach, exactly what I would need. However, after installing it I cannot get it to work. Even with a

@mpi_parallel(1)
def test_parallel():
    assert True

I get subprocess.CalledProcessError, saying that the command returned the error code 15. When I run the command by hand on the command line, all works well. Could be an environment thing? I've no idea how to debug or even fix that.

I'm running on macOS with Python 3.8 from miniconda. Any help is appreciated!

NOhs commented 3 years ago

Thank you for trying out the package. Can you post the entire output of pytest here? This would help me a lot to figure out what might be the issue.

pancetta commented 3 years ago
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/robert/Documents/codes/performance
plugins: benchmark-3.2.3, MPI-0.0.1.dev2
collected 2 items                                                                                                                                                                                                                                      

test_mpi.py .F                                                                                                                                                                                                                                   [100%]

======================================================================================================================= FAILURES =======================================================================================================================
____________________________________________________________________________________________________________________ test_parallel _____________________________________________________________________________________________________________________

args = (), kwargs = {}, executable = 'mpirun', test_name = 'test_mpi.py::test_parallel'
env = {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, errors = [(0, '')], i = 0
file_name = 'test_mpi.py..test_parallel_0', f = <_io.TextIOWrapper name='test_mpi.py..test_parallel_0' mode='r' encoding='UTF-8'>, rank = 0

    @functools.wraps(func)
    def replacement_func(*args, **kwargs):
        if not in_mpi_session():
            executable = mpi_executable(mpi_executable_name)
            test_name = get_pytest_input(func)
            env = dict(os.environ)
            env["PYTHONPATH"] = os.pathsep.join(sys.path)

            try:
>               mpi_subprocess_output = subprocess.check_output(
                    [
                        executable,
                        "-np",
                        str(nprocs),
                        sys.executable,
                        "-m",
                        "pytest_MPI._print_capture",
                        test_name,
                    ],
                    env=env,
                    stderr=subprocess.STDOUT,
                )

../../../miniconda3/envs/performance/lib/python3.8/site-packages/pytest_MPI/_decorator.py:93: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

timeout = None, popenargs = (['mpirun', '-np', '1', '/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest_MPI._print_capture', ...],)
kwargs = {'env': {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, 'stderr': -2}

    def check_output(*popenargs, timeout=None, **kwargs):
        r"""Run command with arguments and return its output.

        If the exit code was non-zero it raises a CalledProcessError.  The
        CalledProcessError object will have the return code in the returncode
        attribute and output in the output attribute.

        The arguments are the same as for the Popen constructor.  Example:

        >>> check_output(["ls", "-l", "/dev/null"])
        b'crw-rw-rw- 1 root root 1, 3 Oct 18  2007 /dev/null\n'

        The stdout argument is not allowed as it is used internally.
        To capture standard error in the result, use stderr=STDOUT.

        >>> check_output(["/bin/sh", "-c",
        ...               "ls -l non_existent_file ; exit 0"],
        ...              stderr=STDOUT)
        b'ls: non_existent_file: No such file or directory\n'

        There is an additional optional argument, "input", allowing you to
        pass a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it too will be used internally.  Example:

        >>> check_output(["sed", "-e", "s/foo/bar/"],
        ...              input=b"when in the course of fooman events\n")
        b'when in the course of barman events\n'

        By default, all communication is in bytes, and therefore any "input"
        should be bytes, and the return value will be bytes.  If in text mode,
        any "input" should be a string, and the return value will be a string
        decoded according to locale encoding, or by "encoding" if set. Text mode
        is triggered by setting any of text, encoding, errors or universal_newlines.
        """
        if 'stdout' in kwargs:
            raise ValueError('stdout argument not allowed, it will be overridden.')

        if 'input' in kwargs and kwargs['input'] is None:
            # Explicitly passing input=None was previously equivalent to passing an
            # empty string. That is maintained here for backwards compatibility.
            kwargs['input'] = '' if kwargs.get('universal_newlines', False) else b''

>       return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
                   **kwargs).stdout

../../../miniconda3/envs/performance/lib/python3.8/subprocess.py:411: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True, popenargs = (['mpirun', '-np', '1', '/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest_MPI._print_capture', ...],)
kwargs = {'env': {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, 'stderr': -2, 'stdout': -1}
process = <subprocess.Popen object at 0x7fa787e7a6a0>, stdout = b'', stderr = None, retcode = 15

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.

        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.

        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.

        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.

        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.

        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE

        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE

        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['mpirun', '-np', '1', '/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest_MPI._print_capture', 'test_mpi.py::test_parallel']' returned non-zero exit status 15.

../../../miniconda3/envs/performance/lib/python3.8/subprocess.py:512: CalledProcessError

During handling of the above exception, another exception occurred:

args = (), kwargs = {}, executable = 'mpirun', test_name = 'test_mpi.py::test_parallel'
env = {'CONDA_DEFAULT_ENV': 'performance', 'CONDA_EXE': '/Users/robert/miniconda3/bin/conda', 'CONDA_PREFIX': '/Users/robert/miniconda3/envs/performance', 'CONDA_PREFIX_1': '/Users/robert/miniconda3', ...}, errors = [(0, '')], i = 0
file_name = 'test_mpi.py..test_parallel_0', f = <_io.TextIOWrapper name='test_mpi.py..test_parallel_0' mode='r' encoding='UTF-8'>, rank = 0

    @functools.wraps(func)
    def replacement_func(*args, **kwargs):
        if not in_mpi_session():
            executable = mpi_executable(mpi_executable_name)
            test_name = get_pytest_input(func)
            env = dict(os.environ)
            env["PYTHONPATH"] = os.pathsep.join(sys.path)

            try:
                mpi_subprocess_output = subprocess.check_output(
                    [
                        executable,
                        "-np",
                        str(nprocs),
                        sys.executable,
                        "-m",
                        "pytest_MPI._print_capture",
                        test_name,
                    ],
                    env=env,
                    stderr=subprocess.STDOUT,
                )

            except subprocess.CalledProcessError as error:
                print(error.output.decode('utf-8'))
                errors = []
                for i in range(nprocs):
                    file_name = f'{get_filename(test_name)}_{i}'
                    if os.path.isfile(file_name):
                        with open(file_name) as f:
                            errors.append((i, f.read()))
                        os.remove(file_name)

                for rank, message in errors:
                    print(f'Rank {rank} reported an error:\n{message}')

>               assert False
E               assert False

../../../miniconda3/envs/performance/lib/python3.8/site-packages/pytest_MPI/_decorator.py:120: AssertionError
----------------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------------

Rank 0 reported an error:

=============================================================================================================== short test summary info ================================================================================================================
FAILED test_mpi.py::test_parallel - assert False
============================================================================================================= 1 failed, 1 passed in 0.86s ==============================================================================================================
pancetta commented 3 years ago

OK, I hope this is readable to some extend. It looks like the subprocess ends with error code 15 and I have no idea why. Here is a txt file with the output, maybe this is better to digest: https://drive.google.com/file/d/1ZV3Hs2xqPmHlTuv9J4430-Zx-1683_CP/view?usp=sharing

NOhs commented 3 years ago

Indeed it seems like it cannot find mpirun. But you said, when you run the script via the command line it works. So how are you calling the test now?

pancetta commented 3 years ago

So, when I run

mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel

it goes through without any complaints. I don't see any output, though (when adding a print statement).

NOhs commented 3 years ago

And if you call

pytest

in the folder where you have the test_mpi.py you get the above error message?

pancetta commented 3 years ago

Exactly. I have the feeling that I'm doing something stupid here, but I don't see what that could be. Two more pieces of information: MPI is installed using miniconda, too, so maybe this is a path/environment issue. And: I also have pytest-benchmark installed (don't know if this is a problem).

NOhs commented 3 years ago

Can you set your test to assert False and run:

mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel
echo $?

Does it create a file? Does it print 0?

pancetta commented 3 years ago

I get an empty file and echo prints out 15.. interesting..

mpirun works when running a simple test program.

NOhs commented 3 years ago

Can you run

python -m pytest_MPI._print_capture test_mpi.py::test_parallel

?

pancetta commented 3 years ago

Goes through, no output, error code 1 (since assume False)

NOhs commented 3 years ago

And the content of the file is still empty?

pancetta commented 3 years ago

Oh, no, sorry:

    @mpi_parallel(1)
    def test_parallel():
>       assert False
E       assert False

test_mpi.py:9: AssertionError
=========================== short test summary info ============================
FAILED test_mpi.py::test_parallel - assert False%  
NOhs commented 3 years ago

Can you try running a simple MPI program:

your file

from mpi4py import MPI

print(MPI.COMM_WORLD.Get_rank())

and in the terminal:

mpirun -np 4 python your_file.py
pancetta commented 3 years ago

Yes, this works.

NOhs commented 3 years ago

Just to make sure as you didn't include it in your initial post, you have the following import in your test file?

from pytest_MPI import mpi_parallel
pancetta commented 3 years ago

Yes. This would have been such a stupid error.. but, no, it's there.

NOhs commented 3 years ago

What about:

new_script.py

import subprocess
import sys

output = subprocess.check_output(["mpiexec", "-np", "4", sys.executable, "your_file.py"], universal_newlines=True)
print(output)
python new_script.py
pancetta commented 3 years ago

Got the expected output of 0-3 in some random order..

NOhs commented 3 years ago

How did you install the package? Did you use pip and a virtual environment?

pancetta commented 3 years ago

Yes. I was/am in my conda environment, went to the download folder and ran

pip install .

Went through without complaints, me happy. You can see the location of the addon in the output above.

NOhs commented 3 years ago

And what about this code (sry that I cannot be more specific, I am basically also doing a trial and error at this point :/ ):

yet_another_try.py

import sys
import subprocess

subprocess.check_output(
        [sys.executable, "-m", "pytest", "--color=yes", "--in_mpi_session", "test_mpi.py::test_parallel"],
        stderr=subprocess.STDOUT,
        universal_newlines=True,
    )
mpirun -np 1 python yet_another_try.py
pancetta commented 3 years ago

Well, here it comes:

Traceback (most recent call last):
  File "new_script.py", line 4, in <module>
    subprocess.check_output(
  File "/Users/robert/miniconda3/envs/performance/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/Users/robert/miniconda3/envs/performance/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/Users/robert/miniconda3/envs/performance/bin/python', '-m', 'pytest', '--    color=yes', '--in_mpi_session', 'test_mpi.py::test_parallel']' returned non-zero exit status 15.
NOhs commented 3 years ago

And if you run:

python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
pancetta commented 3 years ago
platform darwin -- Python 3.8.5, pytest-6.1.0, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/robert/Documents/codes/performance
plugins: benchmark-3.2.3, MPI-0.0.1.dev2
collected 1 item                                                                                                                                                                                                                                       

test_mpi.py F                                                                                                                                                                                                                                    [100%]

======================================================================================================================= FAILURES =======================================================================================================================
____________________________________________________________________________________________________________________ test_parallel _____________________________________________________________________________________________________________________

    @mpi_parallel(1)
    def test_parallel():
    >       assert False
    E       assert False

test_mpi.py:9: AssertionError
=============================================================================================================== short test summary info ================================================================================================================
FAILED test_mpi.py::test_parallel - assert False
================================================================================================================== 1 failed in 0.05s ===================================================================================================================
(performance) 
NOhs commented 3 years ago

And:

mpirun -np 1 python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
pancetta commented 3 years ago
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.1.0, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/robert/Documents/codes/performance
plugins: benchmark-3.2.3, MPI-0.0.1.dev2
collected 1 item

test_mpi.py F                                                            [100%]

=================================== FAILURES ===================================
________________________________ test_parallel _________________________________

    @mpi_parallel(1)
    def test_parallel():
>       assert False
E       assert False

test_mpi.py:9: AssertionError
=========================== short test summary info ============================
FAILED test_mpi.py::test_parallel - assert False
============================== 1 failed in 0.05s ===============================
NOhs commented 3 years ago

can you remove the universal_newlines statement from the test file above and retry?

pancetta commented 3 years ago

mpirun -np 1 python new_script.py

gives me the same error as before, the others do not depend on new_script.py. Is that intended?

Sorry, my new_script.py is your yet_another_try.py.

NOhs commented 3 years ago

What about:

mpirun -np 1 /Users/robert/miniconda3/envs/performance/bin/python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel

and

/Users/robert/miniconda3/envs/performance/bin/python -m pytest --color=yes --in_mpi_session test_mpi.py::test_parallel
pancetta commented 3 years ago

The same expected output from pytest.

NOhs commented 3 years ago

Same as in: it works, or same as in it gives the same error? Hm, this is really strange.

pancetta commented 3 years ago

Same as in: it works, i.e. it throws the assertion error from pytest.

Yeah, strange it is. I thank you for your effort! Did you ever test this with conda?

NOhs commented 3 years ago

Ok, so in summary, what we have so far:

But we are literally copying the command line command into the subprocess function...

I haven't tested it with conda, but did test it on Windows and Linux. I cannot imagine subprocess being so different though between the installations.

Can you replace the check_output with run?

pancetta commented 3 years ago

Is this supposed to work out of the box, i.e. without any modifications to the parameters? I get this:

$ mpirun -np 1 python new_script.py                                                                                                                                                                                                                 1s 
[cli_0]: write_line error; fd=4 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=4 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Fatal error in MPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586): 
MPID_Init(175).......: channel initialization failed
MPID_Init(463).......: PMI_Get_appnum returned -1
[cli_0]: write_line error; fd=4 buf=:cmd=abort exitcode=1093647
:
system msg for write_line failure : Bad file descriptor
pancetta commented 3 years ago

So, if this is supposed to work, I think this error could come from using the wrong mpirun. There is yet another MPI installation outside of the conda environment and IF, for some reason, subprocess.run does not see the conda version of MPI, it will produce strange errors.

NOhs commented 3 years ago

Hm, interesting. I have the feeling we are getting closer. Does Anaconda offer different packages for MPI? If so, can you try a different one? Like MPICH vs OpenMPI?

pancetta commented 3 years ago

OK, before I do that, I replaced

subprocess.run([sys.executable, "-m", "pytest", "--color=yes", "--in_mpi_session", "test_mpi.py::test_parallel"])

with

subprocess.run(["which", "mpirun"])

and got the correct one from the conda environment. So it is not an issue of using the wrong MPI installation for executing via subprocess.run, at least not directly. Is it within pytest-MPI?

pancetta commented 3 years ago

So, here is what I got:

pancetta commented 3 years ago

And:

Bottom line: it seems to do the right thing when using OpenMPI, but then I cannot use the fixture either because the process does not terminate. And we still don't know why it does not work with MPICH.

NOhs commented 3 years ago

Can you try:

mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel

with OpenMPI?

pancetta commented 3 years ago

Weird!

--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  getting local rank failed
  --> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_init failed
  --> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "No permission" (-17) instead of "Success" (0)
--------------------------------------------------------------------------
NOhs commented 3 years ago

So I configured a Travis runner and was first getting the same error message, but after an environment variable that was suggested online, it now also hangs indefinitely. Could you try the MPICH version again and use https://stackoverflow.com/a/54811969/2305545 this hint to try to get it running?

pancetta commented 3 years ago

Yes, I saw that comment, too, but my hosts file is already correct. Sorry..

pancetta commented 3 years ago

OK, so using export PMIX_MCA_gds=hash fixes the problem with OpenMPI, i.e. the command

mpirun -np 1 python -m pytest_MPI._print_capture test_mpi.py::test_parallel

now also hangs indefinitely for me when using OpenMPI. It goes through with MPICH.

NOhs commented 3 years ago

So you do not get the exit status 15 with mpich? Could you check: https://github.com/NOhs/pytest-MPI/blob/travis-integration/.travis.yml for any differences to your setup? Because on the Travis OSX VMs, I still get the exit status 15.

pancetta commented 3 years ago

Yeah, sorry, I did not look at the error code. When doing echo $? after the run, the error code is 15.

NOhs commented 3 years ago

Ok. Darn. I think I will create some more examples and run them on travis to find out an easy to understand case of the error. Then I probably have to contact either the mpi4py or the open-mpi teams and ask for help there. It might take some time, sorry :/

pancetta commented 3 years ago

No worries. I learned a lot of useful things here and when looking at how you implemented things. I understand that you now have a setup which lets you reproduce the error on your machine?

NOhs commented 3 years ago

On an OSX virtual machine, yes.