MolSSI / QCEngine

Quantum chemistry program executor and IO standardizer (QCSchema).
https://molssi.github.io/QCEngine/
BSD 3-Clause "New" or "Revised" License
163 stars 78 forks source link

Prepend psiimport path instead of overwriting `$PYTHONPATH` #389

Closed Flamefire closed 1 year ago

Flamefire commented 1 year ago

Description

When psi4 binary is found but the Python package is not we should not overwrite the $PYTHONPATH before importing psi4 as that may cause other required packages to be not found anymore. Hence prepend it.

See https://github.com/MolSSI/QCEngine/issues/292#issuecomment-1337320606

Changelog description

Fix IndexError in PSI4 Harness when psi4 Python package is not found but the binary is.

loriab commented 1 year ago

Thanks, I think I agree. In answer to your

So why not simply import psi4 and directly access psi4.executable?

it's because which_import finds the location w/o actually importing. so the pythonpath adjustment is to get enough dependencies locatable so that import psi4 can work and psi4.executable can be accessed.

codecov[bot] commented 1 year ago

Codecov Report

Merging #389 (fb22b94) into master (c68b547) will increase coverage by 3.39%. The diff coverage is 0.00%.

Additional details and impacted files
Flamefire commented 1 year ago

it's because which_import finds the location w/o actually importing. so the pythonpath adjustment is to get enough dependencies locatable so that import psi4 can work and psi4.executable can be accessed.

And that's what I don't understand: You find the location w/o importing to then adjust the PYTHONPATH with the found location to import it in a subprocess. But the location can only be found if PYTHONPATH already contains it so the import would work. So why changing it? Or did I miss how which_import works?

loriab commented 1 year ago

it's because which_import finds the location w/o actually importing. so the pythonpath adjustment is to get enough dependencies locatable so that import psi4 can work and psi4.executable can be accessed.

And that's what I don't understand: You find the location w/o importing to then adjust the PYTHONPATH with the found location to import it in a subprocess. But the location can only be found if PYTHONPATH already contains it so the import would work. So why changing it? Or did I miss how which_import works?

The overall aim for this def found() is to make sure a paired bin/psi4 and psi4/__init__ are found and ready to load. There's some configuring that the program knows above, so it's best to use psi4 --module and psi4.executable. https://github.com/psi4/psi4/blob/master/psi4/__init__.py#L38-L42 The overall pattern for the QCEngine harnesses is that found() looks for the installation (but doesn't run or import) and version() does a simple run. https://github.com/MolSSI/QCEngine/blob/master/qcengine/programs/nwchem/runner.py#L73-L87

You're correct that which_import is finding the signature of the psi4 package but not actually trying to import it (https://github.com/MolSSI/QCElemental/blob/next/qcelemental/util/importing.py#L36). So when that returns True, it may still be that not all dependencies required for psi4 are importable. My first thought (and how I answered above) was that the env modification was to add a likely dependency location to sys.path so that import psi4 worked and psi4.executable could be accessed. But I entirely agree with your analysis that that is redundant. My second thought was that the PYTHONPATH truncation for env was to restrict paths so that, say a psi4.py couldn't interfere with the targeted psi4 installation. But I'm not thinking of what scenario would hit that. I've removed the env complication in #391 and will try that locally for a few days.

Would that satisfy your intent the same as this PR? Hopefully I'll have a bright idea (or hit a bug) in the meantime about what I was doing.

Flamefire commented 1 year ago

Would that satisfy your intent the same as this PR? Hopefully I'll have a bright idea (or hit a bug) in the meantime about what I was doing.

But why the subprocess? What about the more common:

try:
  import psi4
except ImportError:
  return None
return psi4.executable[:-5]

If you still want the subprocess please include at least error handling, see the linked comment:

So I'd strongly suggest to add any kind of error checking first before accessing exc["stdout"]

I.e. at least check the exit code and possibly that exc["stdout"] is not empty which both suggests that the import failed.

loriab commented 1 year ago

@Flamefire, in case you're still monitoring this issue, I think I've got it fixed at #418 . If you see any deficiencies, please let me know.

Flamefire commented 1 year ago

@Flamefire, in case you're still monitoring this issue, I think I've got it fixed at #418 . If you see any deficiencies, please let me know.

Looks good I think. If you want to be safe I'd also check the exit code as suggested in my prior comment, i.e. exc["proc"].wait(timeout=30) == 0

loriab commented 1 year ago

Looks good I think. If you want to be safe I'd also check the exit code as suggested in my prior comment, i.e. exc["proc"].wait(timeout=30) == 0

Thanks, done, and I caught another bug along the way.