duartegroup / autodE

automated reaction profile generation
https://duartegroup.github.io/autodE/
MIT License
166 stars 51 forks source link

Make autodE compatible with Windows and change parallel process handling #202

Closed shoubhikraj closed 1 year ago

shoubhikraj commented 1 year ago

Major change: multiprocessing library replaced with concurrent.futures for Linux/macOS and loky for Windows

Minor change: timeout decorator in autode/utils.py is reimplemented with a switch for Windows

Other changes

Tests on Windows: all success

TODO:

t-young31 commented 1 year ago

it looks like the CI test where XTB calculations are run (pytest_codecov) are failing because there isn't sufficient memory on the runner. Don't understand why 16 GB is being required because Config.n_cores should be set to 1 before each test 😕 (tests/conftest.py). Nevertheless, I think the memory check should be on the total rather than the available, and let the OS deal with the consequences!

t-young31 commented 1 year ago

also looks like the linting is failing because black hasn't been run. if you follow the dev install instructions and install pre-commit it'll do it without having to think about it!

shoubhikraj commented 1 year ago

it looks like the CI test where XTB calculations are run (pytest_codecov) are failing because there isn't sufficient memory on the runner. Don't understand why 16 GB is being required because Config.n_cores should be set to 1 before each test 😕 (tests/conftest.py). Nevertheless, I think the memory check should be on the total rather than the available, and let the OS deal with the consequences!

Thanks for the review! Is there anything I could locally check about the XTB test? I haven't modified anything in the conftest.py file, so not sure why it is not working. I used available memory because the OS is bound to consume some amount of memory at all times, so requesting the total amount of memory (by Orca or any other software) would increase the chances of crash. Is there any harm in using available memory, if it is available from psutil?

t-young31 commented 1 year ago

Is there anything I could locally check about the XTB test? I haven't modified anything in the conftest.py file, so not sure why it is not working.

I imagine it was broken before this PR and I just didn't spot it. If you find out why that would be awesome!

I used available memory because the OS is bound to consume some amount of memory at all times, so requesting the total amount of memory (by Orca or any other software) would increase the chances of crash. Is there any harm in using available memory, if it is available from psutil?

I don't think either the total or available memory is the right number because what you really want is the maximum amount that can be allocated to the calculation without kicking crucial stuff out of memory. I'd however be pretty strongly in favour of having it as the total because the available memory makes the chances that a calculation raising that runtime error non-deterministic... which would be super annoying. Say, If the OS could re-jig things and make it possible to run the calc. e.g. if I happen to have a load of tabs open in my browser and try to run something locally I'd rather they were kicked out than my calculation failing.

shoubhikraj commented 1 year ago

@t-young31 Is there a way to dump the current state of autode Config (into a string, or a file or anything serializable), and then reload it in another process? pickle does not seem to be storing the values of the keywords set for ORCA or G09 etc. This is a problem because that means the worker processes are not receiving any changes that the main process may have done on Config.

shoubhikraj commented 1 year ago

@t-young31 I have managed to fix most of the problems, and now autodE should run on Windows as well. About the slow running tests, the problem is due to this wrapper function:

https://github.com/duartegroup/autodE/blob/7795263669707b0da7d424c0bd64b89ac02f96ef/autode/utils.py#L515

The default processes in Loky are spawned (with CreateProcess on Windows and fork/exec on Linux). Spawning processes is generally slower than forking, and takes like ~50-100 ms and probably gets slower if many processes are spawned then terminated in quick succession. Many of the tests, especially on molecule and graph call the functions is_isomorphic(), which is the only function in autodE wrapped with @timeout().

https://github.com/duartegroup/autodE/blob/6a4709992c1866adfd18ea57bbf2957d5be1f6ac/autode/mol_graphs.py#L543-L549

I am not completely sure what to do about this 😅. Looking through the autodE code, there are not more than four-five cases where is_isomorphic is used. If the function is not called too often in real use cases, then this would only make the tests slower. But if it is called quite a lot by a user, then it would be a problem, so it really depends on how autodE is being used.

I see two ways to get around this:

  1. Using a reusable process pool for the timeout function, which will re-use previously spawned processes to run different tasks. This would be much faster.
  2. Set the loky start method to fork for Linux/MacOS only. This would make loky act almost like a wrapper around multiprocessing and autodE would behave exactly like it did before on Linux/MacOS, but it would be slow on Windows.

There are also a couple of possible issues/questions about the timeout wrapper implementation that I have wrote as comments in code.

t-young31 commented 1 year ago

I've had another thought for the timeout.. we could try and workout what input to is_isomorphic makes the function 'hang' and just if <invalid input>: raise ValueError? I wouldn't be unsurprised this behaviour is either (a) a bug in an old NetworkX version or (b) when the graphs are big and the problem becomes hard (isomorphism is exponentially complex)

shoubhikraj commented 1 year ago

@t-young31 What I can see in Windows if there is no timeout, is that for really large graphs (like the one generated in tests), the python process allocates more and more memory until stack overflow happens, and Windows kills the process. So, I imagine this is case (b).

The current version I wrote removes the timeout for windows entirely, unless the user specifies Config.use_experimental_timeout=True, when the new implementation is used. Otherwise for Linux and Mac, the original timeout is present. I think this should be ok for most use cases. If a windows user runs into the isomorphism check hanging, then the switch will fix it.

(This also means that for Linux/Mac, the other dependencies i.e., loky/cloudpickle are not needed. But I could not find a way to write requirements.txt with a conditional for windows, so I kept all of them.)

codecov[bot] commented 1 year ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (v1.4.0@a8c2b8f). Click here to learn what that means. The diff coverage is n/a.

@@            Coverage Diff            @@
##             v1.4.0     #202   +/-   ##
=========================================
  Coverage          ?   97.22%           
=========================================
  Files             ?      195           
  Lines             ?    20323           
  Branches          ?        0           
=========================================
  Hits              ?    19760           
  Misses            ?      563           
  Partials          ?        0           
Flag Coverage Δ
unittests 97.22% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

shoubhikraj commented 1 year ago

I believe I have now fixed all the issues. It should work fine on windows. The original behaviour on linux and mac should remain unchanged.

shoubhikraj commented 1 year ago

The config should be picklable now, and the original format has been left unchanged. The pytest-codecov for patch is failing because of the parts of code that do not run on the main process. Not sure if it can be fixed.