Closed shoubhikraj closed 1 year ago
it looks like the CI test where XTB calculations are run (pytest_codecov) are failing because there isn't sufficient memory on the runner. Don't understand why 16 GB is being required because Config.n_cores should be set to 1 before each test 😕 (tests/conftest.py). Nevertheless, I think the memory check should be on the total rather than the available, and let the OS deal with the consequences!
also looks like the linting is failing because black hasn't been run. if you follow the dev install instructions and install pre-commit it'll do it without having to think about it!
it looks like the CI test where XTB calculations are run (pytest_codecov) are failing because there isn't sufficient memory on the runner. Don't understand why 16 GB is being required because Config.n_cores should be set to 1 before each test 😕 (tests/conftest.py). Nevertheless, I think the memory check should be on the total rather than the available, and let the OS deal with the consequences!
Thanks for the review! Is there anything I could locally check about the XTB test? I haven't modified anything in the conftest.py file, so not sure why it is not working. I used available memory because the OS is bound to consume some amount of memory at all times, so requesting the total amount of memory (by Orca or any other software) would increase the chances of crash. Is there any harm in using available memory, if it is available from psutil?
Is there anything I could locally check about the XTB test? I haven't modified anything in the conftest.py file, so not sure why it is not working.
I imagine it was broken before this PR and I just didn't spot it. If you find out why that would be awesome!
I used available memory because the OS is bound to consume some amount of memory at all times, so requesting the total amount of memory (by Orca or any other software) would increase the chances of crash. Is there any harm in using available memory, if it is available from psutil?
I don't think either the total or available memory is the right number because what you really want is the maximum amount that can be allocated to the calculation without kicking crucial stuff out of memory. I'd however be pretty strongly in favour of having it as the total because the available memory makes the chances that a calculation raising that runtime error non-deterministic... which would be super annoying. Say, If the OS could re-jig things and make it possible to run the calc. e.g. if I happen to have a load of tabs open in my browser and try to run something locally I'd rather they were kicked out than my calculation failing.
@t-young31 Is there a way to dump the current state of autode Config (into a string, or a file or anything serializable), and then reload it in another process? pickle
does not seem to be storing the values of the keywords set for ORCA or G09 etc. This is a problem because that means the worker processes are not receiving any changes that the main process may have done on Config.
@t-young31 I have managed to fix most of the problems, and now autodE should run on Windows as well. About the slow running tests, the problem is due to this wrapper function:
The default processes in Loky are spawned (with CreateProcess
on Windows and fork
/exec
on Linux). Spawning processes is generally slower than forking, and takes like ~50-100 ms and probably gets slower if many processes are spawned then terminated in quick succession. Many of the tests, especially on molecule and graph call the functions is_isomorphic()
, which is the only function in autodE wrapped with @timeout()
.
I am not completely sure what to do about this 😅. Looking through the autodE code, there are not more than four-five cases where is_isomorphic
is used. If the function is not called too often in real use cases, then this would only make the tests slower. But if it is called quite a lot by a user, then it would be a problem, so it really depends on how autodE is being used.
I see two ways to get around this:
loky
start method to fork
for Linux/MacOS only. This would make loky
act almost like a wrapper around multiprocessing
and autodE would behave exactly like it did before on Linux/MacOS, but it would be slow on Windows.There are also a couple of possible issues/questions about the timeout
wrapper implementation that I have wrote as comments in code.
I've had another thought for the timeout.. we could try and workout what input to is_isomorphic
makes the function 'hang' and just if <invalid input>: raise ValueError
?
I wouldn't be unsurprised this behaviour is either (a) a bug in an old NetworkX version or (b) when the graphs are big and the problem becomes hard (isomorphism is exponentially complex)
@t-young31 What I can see in Windows if there is no timeout, is that for really large graphs (like the one generated in tests), the python process allocates more and more memory until stack overflow happens, and Windows kills the process. So, I imagine this is case (b).
The current version I wrote removes the timeout for windows entirely, unless the user specifies Config.use_experimental_timeout=True
, when the new implementation is used. Otherwise for Linux and Mac, the original timeout is present. I think this should be ok for most use cases. If a windows user runs into the isomorphism check hanging, then the switch will fix it.
(This also means that for Linux/Mac, the other dependencies i.e., loky/cloudpickle are not needed. But I could not find a way to write requirements.txt with a conditional for windows, so I kept all of them.)
:exclamation: No coverage uploaded for pull request base (
v1.4.0@a8c2b8f
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## v1.4.0 #202 +/- ##
=========================================
Coverage ? 97.22%
=========================================
Files ? 195
Lines ? 20323
Branches ? 0
=========================================
Hits ? 19760
Misses ? 563
Partials ? 0
Flag | Coverage Δ | |
---|---|---|
unittests | 97.22% <0.00%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
I believe I have now fixed all the issues. It should work fine on windows. The original behaviour on linux and mac should remain unchanged.
The config should be picklable now, and the original format has been left unchanged. The pytest-codecov for patch is failing because of the parts of code that do not run on the main process. Not sure if it can be fixed.
Major change: multiprocessing library replaced with
concurrent.futures
for Linux/macOS andloky
for Windowsconcurrent.futures
hasProcessPoolExecutor
with slightly different argument names, but uses multiprocessing backend, so there should be no difference in behaviour__main__
etc.;loky
solves many of those issuesMinor change: timeout decorator in
autode/utils.py
is reimplemented with a switch for WindowsConfig.use_experimental_timeout = True
will use a reusable process pool fromloky
to implement timeoutOther changes
Tests on Windows: all success
TODO: