Handling of numpy, h5, and other python module dependencies in tests

QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support

http://www.qmcpack.org

Other

298 stars 139 forks source link

Handling of numpy, h5, and other python module dependencies in tests #226

Closed prckent closed 3 years ago

prckent commented 7 years ago

This issue is intended to be for discussion of a point raised by #225 that we have not formalized our policy towards and that will keep recurring.

For testing of hdf5 output as well as convenient processing of output data, use of python and various python modules is attractive. e.g. hdf5, xml, numpy/scipy. However, historically these tools have not always been available on head nodes of supercomputers. NEXUS also needs them (in general), but for testing we have managed to avoid the dependency until now.

How should we handle this? A naive assumption that these tools are available will result in failing tests. Avoiding them does not seem realistic.

The "correct" way to do this might be to add detection of python and various modules in our cmake configure, then skip tests where they are needed.

ye-luo commented 7 years ago

It is very helpful to check these modules during cmake step. For me, qmca is a required tool which relies on numpy. When I forgot to install it on my desktop, I saw the error message indicating numpy. We can not assume everyone to understand the python error message. When I used Jaron's density tools need hdf5 python module, the error message was poor and I was basically debugging the script to find what was the problem. So if we can handle them at the cmake step, it will avoid solving problems in the dark. Provide warning is sufficient.

ye-luo commented 7 years ago

There is one thing I'm not sure. On big machines the frontend nodes and the nodes dispatching jobs can be different. If the frontend nodes have these python modules but MOM nodes have not, the tests still fail. Maybe I'm over cautious and frontend nodes and MOM nodes have basically identical software setup.

markdewing commented 7 years ago

The tests that use check_scalars.py are pretty fundamental, so it seems worthwhile to minimize the dependencies in that script.

For other tests (like the estimator tests), we would like to minimize the effort to write, understand, and maintain the tests, so allowing more dependencies seems reasonable.

We will need to implement the 'correct' solution that @prckent mentions - which probably means writing a small Python script to check for dependencies that CMake will call at configure time.

I'm not sure how to solve the problem of different software installs on front-end vs MOM nodes, except to be careful in setting up the environment in which the test runs (as the automated scripts do).

jtkrogel commented 7 years ago

A configure time check should be fairly straightforward to manage. We can require the dispatched jobs to also perform a check and fail with a suitable summary of the failed imports.

It may also be worthwhile to have a few dispatched jobs that do nothing but check the suitability of the environment as this will clearly communicate the overall reason for failure (python imports rather than assessed failure) in the ctest command line/log output.

prckent commented 7 years ago

Suggested plan:

Keep trying to avoid dependencies going forward, particularly check_scalars.py
Create and add separate availability tests for python and each needed python module to our cmake configure
Create and add separate tests for python and python functionality. e.g. To catch a poorly installed/decoy numpy.
Skip tests where this functionality is not available

jtkrogel commented 7 years ago

It appears that cmake/ctest now checks for python dependencies and guards tests accordingly. Issue resolved?

prckent commented 7 years ago

I believe so. @markdewing OK?

markdewing commented 7 years ago

I think point 3 on Paul's list is not implemented - there is only a configure-time check for the python modules. There is not separate run-time check for python modules. Could close this and open a new issue if this becomes a problem.

prckent commented 7 years ago

I think are missing a simple "python works" test, without module dependencies. We should check python runs and can open a file. We can anticipate problems running python on compute nodes, and a clear test would help.

prckent commented 3 years ago

We have moved to the following situation:

cmake check for python
import checks hardened in python scripts

What should be done is a more systematic check for python dependencies early in our cmake configure (required/recommended/complete) as part of a cmake refresh.