Closed amarkpayne closed 4 years ago
That's incorrect, the change in the multi-D torsions PR was made because we switched from calculating 1D HRs semi-classically to quantum mechanically. This impacts pressure dependent rate calculations because they are dependent through TST and thermochemistry on the 1D HR partition functions.
That's incorrect, the change in the multi-D torsions PR was made because we switched from calculating 1D HRs semi-classically to quantum mechanically. This impacts pressure dependent rate calculations because they are dependent through TST and thermochemistry on the 1D HR partition functions.
I see, that makes sense. Any ideas what might be going on here? Should we rethink this line in this test i.e. should this test only check that the number is within the right ballpark (one or two decimal places? 1%?) if it will fail whenever we make improvements to Arkane?
As far as I can tell the test passes consistently on travis (or at least did while I was testing the nDTorsions stuff) and I really don't think we should be getting rid of or changing failing tests because we don't know why they fail (the blanket over the head approach to bugs). The way I see it some change in your code is causing Arkane to estimate pdep rate coefficients differently, so you should test at different commits and figure out what specific commit results in the estimation change and use that to figure out why this is happening.
Update: the commit in question is this one here, as this commit fails with this error, but the one before it does not.
Again, I think this is interesting because I do not change any Arkane files except the two of my own that I have created in this PR, and at this point in the PR I had not yet integrated these files with the rest of Arkane.
In this commit though I do add a few new dependencies, so maybe that is the problem? It would seem strange--I will keep looking into this
k(T, P, # rmg dependencies)
? XD
Break the commit up, find the smallest change that causes it...don't worry about test failures not in the pdep test.
I am trying a few things like that, I'll let you know what happens. My thought with the dependency thing was what if the new dependencies require us to use other versions of other dependencies that pdep needs but these versions are not compatible with pdep.
Okay, now we are getting somewhere: I created a branch that just added in the new dependencies, but otherwise has no new code additions and I get this failure (as a side not I also get the error with the unit test that is plaguing the Chlorine DB builds). You can check out the branch here.
Here is the diff of the conda list between master and this branch:
diff master.txt ecr_deps.txt
6a7,8
> appdirs 1.4.3 py27h28b3542_0
>
19c21
< blas 1.0 mkl
---
> blas 2.11 openblas conda-forge
42a45,46
> coincbc 2.10.2 hab63836_0 rmg
>
88a93,94
> glpk 4.65 h3ceedfd_2
>
140a147,148
> libblas 3.8.0 11_openblas conda-forge
>
142a151,152
> libcblas 3.8.0 11_openblas conda-forge
>
152a163,168
> liblapack 3.8.0 11_openblas conda-forge
>
> liblapacke 3.8.0 11_openblas conda-forge
>
> libopenblas 0.3.6 h5a2b251_1
>
189,192d204
< mkl_fft 1.0.12 py27ha843d7b_0
<
< mkl_random 1.0.2 py27hd81dba3_0
<
215c227
< numpy 1.15.4 py27h7e9f1db_0
---
> numpy 1.15.4 py27h99e49ec_0
217c229
< numpy-base 1.15.4 py27hde5b4d6_0
---
> numpy-base 1.15.4 py27h2f8d375_0
246a259,260
> ply 3.11 py27_0
>
272a287,288
> pyomo 5.6.6 py27_0 rmg
>
290a307,308
> pyutilib 5.7.1 py27_0 conda-forge
>
309c327
< scikit-learn 0.20.3 py27hd81dba3_0
---
> scikit-learn 0.20.3 py27h22eb022_0
311c329
< scipy 1.2.1 py27h7c811a0_0
---
> scipy 1.2.1 py27he2b7bc3_0
I'm getting this error with the py3 testing branch and new py3 environment.
# packages in environment at /home/mjliu/anaconda2/envs/test:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_tflow_select 2.3.0 mkl
absl-py 0.7.1 py37_0
algopy 0.5.7 py_0 rmg
astor 0.8.0 py37_0
attrs 19.1.0 py37_1
backcall 0.1.0 py37_0
biopython 1.74 py37h7b6447c_0
blas 1.0 mkl
bleach 3.1.0 py37_0
bzip2 1.0.8 h7b6447c_0
c-ares 1.15.0 h7b6447c_1001
ca-certificates 2019.5.15 1
cairo 1.14.12 h8948797_3
cairocffi 0.9.0 py37_0 rmg
cantera 2.4.0 np115py37_2 cantera
cclib 1.6.1.rmg py37h39e3cac_0 rmg
certifi 2019.6.16 py37_1
cffi 1.12.3 py37h2e261b9_0
chemprop 0.0.1 py_0 rmg
click 7.0 py37_0
coolprop 6.2.1 py37hf484d3e_0 rmg
coverage 4.5.3 py37h7b6447c_0
cycler 0.10.0 py37_0
cython 0.29.13 py37he6710b0_0
dbus 1.13.6 h746ee38_0
decorator 4.4.0 py37_1
defusedxml 0.6.0 py_0
descriptastorus 2.0.0 py_0 rmg
entrypoints 0.3 py37_0
expat 2.2.6 he6710b0_0
ffmpeg 4.0 hcdf2ecd_0
flask 1.1.1 py_0
fontconfig 2.13.0 h9420a91_0
freetype 2.9.1 h8a8886c_1
fribidi 1.0.5 h7b6447c_0
future 0.17.1 py37_0
gast 0.2.2 py37_0
glib 2.56.2 hd408876_0
gmp 6.1.2 h6c8ec71_1
google-pasta 0.1.7 py_0
gprof2dot 2017.9.19 py_0 rmg
graphite2 1.3.13 h23475e2_0
graphviz 2.40.1 h21bd128_2
grpcio 1.16.1 py37hf8bcb03_1
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
gunicorn 19.9.0 py37_0
h5py 2.9.0 py37h7918eee_0
harfbuzz 1.8.8 hffaf4a1_0
hdf5 1.10.4 hb1b8bf9_0
hyperopt 0.1.2 py_0 conda-forge
icu 58.2 h9c2bf20_1
intel-openmp 2019.4 243
ipykernel 5.1.2 py37h39e3cac_0
ipython 7.7.0 py37h39e3cac_0
ipython_genutils 0.2.0 py37_0
ipywidgets 7.5.1 py_0
itsdangerous 1.1.0 py37_0
jedi 0.15.1 py37_0
jinja2 2.10.1 py37_0
joblib 0.13.2 py37_0
jpeg 9b h024ee3a_2
jsonschema 3.0.2 py37_0
jupyter 1.0.0 py37_7
jupyter_client 5.3.1 py_0
jupyter_console 6.0.0 py37_0
jupyter_core 4.5.0 py_0
keras-applications 1.0.8 py_0
keras-preprocessing 1.1.0 py_1
kiwisolver 1.1.0 py37he6710b0_0
libboost 1.67.0 h46d08c1_4
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libopus 1.3 h7b6447c_0
libpng 1.6.37 hbc83047_0
libprotobuf 3.8.0 hd408876_0
libsodium 1.0.16 h1bed415_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.0.10 h2733197_2
libuuid 1.0.3 h1bed415_2
libvpx 1.7.0 h439df22_0
libxcb 1.13 h1bed415_1
libxml2 2.9.9 hea5a465_1
lpsolve 5.5.2.5 2 rmg
lpsolve55 5.5 py37h39e3cac_2 rmg
markdown 3.1.1 py37_0
markupsafe 1.1.1 py37h7b6447c_0
matplotlib 3.1.0 py37h5429711_0
mistune 0.8.4 py37h7b6447c_0
mkl 2019.4 243
mkl-service 2.0.2 py37h7b6447c_0
mkl_fft 1.0.14 py37ha843d7b_0
mkl_random 1.0.2 py37hd81dba3_0
mock 3.0.5 py37_0
mopac 2017 1 rmg
mpmath 1.1.0 py37_0
nbconvert 5.5.0 py_0
nbformat 4.4.0 py37_0
ncurses 6.1 he6710b0_1
networkx 2.3 py_0
ninja 1.9.0 py37hfd86e86_0
nose 1.3.7 py37_2
notebook 6.0.0 py37_0
numdifftools 0.9.39 py_0 rmg
numpy 1.15.4 py37h7e9f1db_0
numpy-base 1.15.4 py37hde5b4d6_0
olefile 0.46 py37_0
openbabel 2.4.1 py37hf484d3e_7 rmg
openssl 1.1.1c h7b6447c_1
pandas 0.25.0 py37he6710b0_0
pandoc 2.2.3.2 0
pandocfilters 1.4.2 py37_1
pango 1.42.4 h049681c_0
parso 0.5.1 py_0
patsy 0.5.1 py37_0
pcre 8.43 he6710b0_0
pexpect 4.7.0 py37_0
pickleshare 0.7.5 py37_0
pillow 6.1.0 py37h34e0f95_0
pip 19.2.2 py37_0
pixman 0.38.0 h7b6447c_0
prometheus_client 0.7.1 py_0
prompt_toolkit 2.0.9 py37_0
protobuf 3.8.0 py37he6710b0_0
psutil 5.6.3 py37h7b6447c_0
ptyprocess 0.6.0 py37_0
py-boost 1.67.0 py37h04863e7_4
pycparser 2.19 py37_0
pydas 1.0.2 py37h1c1f16f_4 rmg
pydot 1.4.1 py37_0
pydqed 1.0.1 py37h1c1f16f_3 rmg
pygments 2.4.2 py_0
pymongo 3.8.0 py37he6710b0_1
pyparsing 2.4.2 py_0
pyqt 5.9.2 py37h05f1152_2
pyrdl 1.1.2 py37h14c3975_0 rmg
pyrsistent 0.14.11 py37h7b6447c_0
python 3.7.4 h265db76_1
python-dateutil 2.8.0 py37_0
pytorch-cpu 1.1.0 py3.7_cpu_0 pytorch
pytz 2019.2 py_0
pyyaml 5.1.2 py37h7b6447c_0
pyzmq 18.1.0 py37he6710b0_0
qt 5.9.7 h5867ecd_1
qtconsole 4.5.4 py_0
quantities 0.12.3 pyh24bf2e0_0 rmg
rdkit 2019.03.4.0 py37hc20afe1_1 rdkit
readline 7.0 h7b6447c_5
scikit-learn 0.21.2 py37hd81dba3_0
scipy 1.3.1 py37h7c811a0_0
send2trash 1.5.0 py37_0
setuptools 41.0.1 py37_0
sip 4.19.8 py37hf484d3e_0
six 1.12.0 py37_0
sqlite 3.29.0 h7b6447c_0
statsmodels 0.10.1 py37hdd07704_0
symmetry 1.0.1 0 rmg
tensorboard 1.14.0 py37hf484d3e_0
tensorboardx 1.8 py_0 conda-forge
tensorflow 1.14.0 mkl_py37h45c423b_0
tensorflow-base 1.14.0 mkl_py37h7ce6ba3_0
tensorflow-estimator 1.14.0 py_0
termcolor 1.1.0 py37_1
terminado 0.8.2 py37_0
testpath 0.4.2 py37_0
tk 8.6.8 hbc83047_0
torchvision-cpu 0.3.0 py37_cuNone_1 pytorch
tornado 6.0.3 py37h7b6447c_0
tqdm 4.32.1 py_0
traitlets 4.3.2 py37_0
wcwidth 0.1.7 py37_0
webencodings 0.5.1 py37_1
werkzeug 0.15.5 py_0
wheel 0.33.4 py37_0
widgetsnbextension 3.5.1 py37_0
wrapt 1.11.2 py37h7b6447c_0
xlwt 1.3.0 py37_0
xz 5.2.4 h14c3975_4
yaml 0.1.7 had09818_2
zeromq 4.3.1 he6710b0_3
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0
@mliu49 are you getting 88.8819863725482 exactly as well or a different number? Interestingly, we first thought that this might be a BLAS issue, but the version hasn't changed in your environment.
No, I'm getting a larger value...
======================================================================
FAIL: A general test for a PDep job in Arkane
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/mjliu/Code/RMG-Py/arkane/pdepTest.py", line 108, in testPDepJob
self.assertAlmostEquals(rxn.kinetics.getRateCoefficient(1000.0, 1.0), 88.88253229631246)
AssertionError: 88.89036973288468 != 88.88253229631246 within 7 places (0.007837436572216916 difference)
As evidenced by the references in PRs, this error is appearing again, though we never figured out what was causing it the first time around.
In the case of #1840, the push build failed while the pr build passed. In comparing the two logs, the conda environments were identical, and the main difference (besides the push vs. pr) seemed to be the Travis worker version. 🤦♂
In the failing build, the worker version is 6.2.6, while in the in passing build, it's 6.2.5. In the failing push build for ReactionMechanismGenerator/RMG-database#367, the failing build is also on version 6.2.6. If this is an accurate correlation, then the ongoing build for #1829 (using version 6.2.6) should fail.
Differences in Travis worker versions affecting build success is not unprecedented, but the mechanism by which this particular test could be affected is not clear.
As additional data points, the builds for #1832 and the two recent builds of master ran on Travis worker 6.2.5 and all passed.
As predicted, the restarted build for #1829 failed...
I am running on my local computer running just this test using pytest in the PyCharm debugger on
RMG-Py: b691b45dd65963a2ccbdf8b256b38b34814f1e18 RMG-database: e409c4
It fails as follows:
# Test the generated network reaction
dictionary = {'hydroperoxylvinoxy': Species().from_smiles('[CH2]C(=O)OO'),
'acetylperoxy': Species().from_smiles('CC(=O)O[O]')}
with open(os.path.join(self.directory, 'chem.inp'), 'r') as chem:
reaction_list = read_reactions_block(chem, dictionary)
rxn = reaction_list[0]
self.assertIsInstance(rxn.kinetics, Chebyshev)
> self.assertAlmostEquals(rxn.kinetics.get_rate_coefficient(1000.0, 1.0), 88.88253229631246)
E AssertionError: 88.8819863725482 != 88.88253229631246 within 7 places (0.0005459237642639891 difference)
This should help us debug by comparison
As an update, I am working on comparing the failing branch above with the branch v2.5.0
(RMG-Py: 068ec113a7e0e6211e91bd0928de68d2b502ea0a) using the same version of the database and working in separate instances of PyCharm on my local computer. Note that the v2.5.0
is a python 2 branch, and those has different versions of our dependencies. However, this branch currently passes the tests.
With help from @mliu49, we have determined that job.K
(the rate data for the pdep job) is different between these two branches. The difference is noticeable by sometimes as few as 4 significant figures in, causing enough of a difference that our chebyshev fits down the line are different.
Update: The issue goes further back than the call to self.K = self.network.calculate_rate_coefficients(self.Tlist.value_si, self.Plist.value_si, self.method)
in the PressureDependenceJob
execute method. I noticed that Mcoll
was different before this point, and upon further digging found that the density and sum of states for the species in the pdep network differ as well. This means that the error occurs at least as far back as line 270 in pdep.py self.initialize()
, where we initialize the PressureDependenceJob
. It could be further back than this, I am not sure.
Inside this initialize call though, we make a call to self.network.initialize(..)
, and it is here where we calculate the density of states. Digging into this a bit further, we end up in calculate_density_of_states
on line 223 of configuration.pyx. Here I have confirmed that q_data
differs between these two branches right before the call to log_q = scipy.interpolate.InterpolatedUnivariateSpline(t_data, np.log(q_data))
. Before this, q_data is initialized as ones, and then modified from calls to mode.get_partition_function(t)
.
I made the following change to the code (because debugging cython is hard) to see if these calls differed. Here is the modification to lines 340 of configuration.pyx on current master:
for mode in modes:
print('Mode partition function: {0}'.format(mode.get_partition_function(t)))
q_data[i] = q_data[i] * mode.get_partition_function(t)
raise Exception
The excetpion is just so that it only prints the results from one loop. Here are the results:
Current py3 master:
Current v2.5.0:
Therefore, it appears that mode.get_partition_function
returns different results. Again, this could go a lot further back (maybe the inputs to all of this are different e.g. something from the log files didn't get transferred over properly), but it at least starts to differ here
I think I have finally isolated the source of the problem (although I have no idea what is going on). In torsion.pyx, we have a method HinderedRotor.solve_schrodinger_equation
that makes a call E = scipy.linalg.eig_banded(H, lower=True, eigvals_only=True, overwrite_a_band=True)
that appears to be the source of our problem. I first got tipped off onto this because self.energies
differed between the two branches. However, I have found that the hamiltonian, H, is exactly the same on both branches, which is the only variable thing before printing the energies inside of this method shows that they differ.
I confirmed that the hamiltonians were the same by dumping the matrices as a list to a yaml file and diffing the yaml files produced by both branches. Interestingly, when I load in this hamiltonian from the yaml file into a jupyter notebook, I get the same energies regardless of which python environment (2 or 3) I use, but the energies are different from either of the energies RMG gets on either branch (no idea why).
The differences are not really close at all. Here is a brief summary of self.energies
python 3 RMG-Py (current master) from pytest in PyCharm: ...
python 2 RMG-Py (v2.5.0) from pytest in PyCharm: ...
python 3 rmg_env from a jupyter notebook, loading in the hamiltonian from yaml ...
python 2 rmg_env from a jupyter notebook, loading in the hamiltonian from yaml ...
Does setting overwrite_a_band=False impact the results?
No, at least for the PyCharm cases I got the same answers as before for each branch
This is the FORTRAN code that is ultimately being called. Given your past experience with FORTRAN, do you see anything interesting? Note that I am not sure which version of the code we use, so you might need to look elsewhere for the source of the correct version number. http://www.netlib.org/lapack/explore-html/d5/db8/chbevd_8f_source.html
I mean there is this bit:
*
* Get machine constants.
*
safmin = slamch( 'Safe minimum' )
eps = slamch( 'Precision' )
smlnum = safmin / eps
bignum = one / smlnum
rmin = sqrt( smlnum )
rmax = sqrt( bignum )
But I would think these would be the same for all modern machines.
Okay we are getting somewhere now. setting overwrite_a_band=False didn't make a change, but requesting eigvals_only=False
(and taking care with what is being returned) DOES change the results. Both branches in PyCharm now give different answers from before, but they still don't agree.
python 3 RMG-Py (current master) from pytest in PyCharm and requesting eigvals_only=False: ...
python 2 RMG-Py (v2.5.0) from pytest in PyCharm and requesting eigvals_only=False: ...
Note that this isn't too surprising, the LAPACK code says that it uses a different algorithm if you need to find the eigenvectors
Wait, Mark can you check the condition number on the Hamiltonian?
That would indeed be the problem: np.linalg.cond(H) = 7.025714299685296e+20
Something to talk about in 10.34 :)
XD...well now we know...
Nevermind. As we discussed offline, I forgot that H was formatted as a banded matrix. I created a sparse matrix from these diags (thanks @mjohnson541 for the suggestion) and got a condition number of 32239.525070193977, which is not too high. To get this, I used the following code (please verify that it makes sense).
diagonals = [H[0, :], H[1, :-1], H[2, :-2], H[3, :-3], H[4, :-4], H[5, :-5]]
A = scipy.sparse.diags(diagonals, [0, -1, -2, -3, -4, -5])
norm_A = scipy.sparse.linalg.norm(A)
norm_invA = scipy.sparse.linalg.norm(scipy.sparse.linalg.inv(A))
cond = norm_A*norm_invA
Note that H is the hamiltonian loaded from the YAML file, which is a 6x401 matrix formatted as a lower banded matrix
You should use numpy.linalg.cond instead. If the condition number is high the matrix inverse cannot be calculated accurately so its norm will be inaccurate. Check the eigenvalues it gives you after you construct it (I think you can calculate them without it taking too long, but I might be wrong) and compare with the ones you expect based on your past experiments that should tell us if we have the same matrix at least.
I tried numpy.linalg.cond, but it did not like dealing with sparse matrices. My computer may be able to handle a 401x401 matrix, let me see
Converting to a dense matrix and running np.linalg.cond yielded 1417.03
Bug Description
The latest build of #1561 failed with the following error:
This PR does not change any part of Arkane or RMG's pdep code though, and in fact this error only showed when I recently rebased this branch on current master. I notice that this test fails often when I run the functional tests locally, and it appears that the number for the rate coefficient appears to drift slightly for reasons unknown (it appears to happen whenever we add a new Arkane example or functional test, for example I believe this was why the most recent change was made to this unit test here.)
I also wonder if it is somehow related to the error with the most recent Chlorine DB branch build here, where the order of the reactants has changed when the functional tests are run in this build.
A going theory that I have is that this is related to Arkane's use of global and module variables, which might cause problems when we run all of the Arkane examples at once, but I am not sure.
In the meantime, we might want to consider re-writing this test. Thoughts?
How To Reproduce
Run the RMG functional tests locally. Check out the
errorCancelingReactions
branch if need be.