Closed ConnorSA closed 2 months ago
Thanks for reporting this. Feels like some array hasn't been initialized as expected.
EDIT: apologies I didn't read carefully enough. It sounds like I would be able to recreate this with randomly generated ACE potentials?
On a first attempt I cannot reproduce this, here is my script:
using ACEpotentials
model = acemodel(elements = [:Ti, ],
order = 3,
totaldegree = 15,
rcut = 6.0,
Eref = [:Ti => -1586.0195, ])
# generate random parameters
params = randn(length(model.basis))
# params = params ./ (1:length(params)).^2 # (this is not needed)
ACEpotentials._set_params!(model, params)
function energy_per_at(pot, i)
at = bulk(:Ti) * i
return JuLIP.energy(pot, at) / length(at)
end
E_per_at = [ energy_per_at(model.potential, i) for i = 1:10 ]
output:
julia> E_per_at
10-element Vector{Float64}:
-1587.2195904712455
-1587.2195904712453
-1587.2195904712453
-1587.2195904712457
-1587.2195904712444
-1587.219590471243
-1587.2195904712419
-1587.2195904712412
-1587.2195904712414
-1587.2195904712414
@ConnorSA -- if you can reproduce your bug with a randomly generated potential (similar to mine) then the most likely consequence is that this is a Julia-Python interface bug.
CC @casv2 @tjjarvinen @wcwitt
Thank you for sharing this code snippet I will run this now, I haven't tried any ACE1 potential, as far as I know this was just a specific thing I was seeing with a potential I trained.
Yeah, so I was wondering if others had any simple single species ACE potentials that they could try and recreate this super cell size effect elsewhere? I have noticed with my tests that this appears quite inconsistently, in the case of HCP Ti I didn't get this explosion, but for BCC the above is what I got.
As far as sharing JSONs go I'm not sure if I'm in a position to share this immediately (copyright stuff with my funding).
After running the above julia code snippet I got this output:
10-element Vector{Float64}:
-1589.9067641795925
-1589.906764179592
-1589.906764179592
-1589.9067641795918
58988.57939066435
-1589.906764179594
-1589.906764179592
-1589.9067641795905
-1589.9067641795923
1.4333233269000164e8
Ok that's scary. Please post your project status
Ah ok so I might running some outdated packages:
⌃ [e3f9bc04] ACE1 v0.11.15
⌃ [3b96b61c] ACEpotentials v0.6.2
[51974c44] ASE v0.5.4
[352459e4] ExtXYZ v0.1.14-DEV `~/.julia/dev/ExtXYZ`
[7073ff75] IJulia v1.24.2
⌃ [945c410c] JuLIP v0.14.2
[6f286f6a] MultivariateStats v0.10.2
[91a5bcdd] Plots v1.39.0
[438e738f] PyCall v1.96.1
[37e2e46d] LinearAlgebra
Would you mind rerunning in an updated or in a fresh environment?
I tried to reproduce the bug with an environment similar to yours,
(bug_connor) pkg> status
Status `~/gits/scratch/bug_connor/Project.toml`
⌅ [e3f9bc04] ACE1 v0.11.15
⌅ [3b96b61c] ACEpotentials v0.6.2
⌅ [945c410c] JuLIP v0.14.2
but couldn't...
I also tried it with a BCC structure and still can't reproduce it.
I have just updated the packages and re-ran the julia snippet and I'm getting some more consistent results now
10-element Vector{Float64}:
-1586.6366298294947
-1586.6366298294943
-1586.6366298294943
-1586.636629829494
-1586.6260240094525
-1586.636629829496
-1586.6366298294945
-1586.6366298294927
-1586.6366298294947
-1586.6338695567263
Not sure what was going on, I am just double checking that this has resolved in the Python environment too :)
this is still strange. Note how I get machine-precision whereas your precision deteriorates for very specific system sizes. I would still consider this a bug.
If I can't reproduce this on my own hardware / environment, you may have to give me acccess to yours to debug.
Where are you running this Connor? If it's on SCRTP I can also try to reproduce.
This is on the hetmathsys nodes
@ConnorSA please also check whether you get this bug also for cubic computational cells?
What version of Julia are you using?
julia version 1.9.3
Testing now on same machine with 1.9.2 (which I had to hand).
@cortner If you still have an SCRTP login (probably unlikely) then you can access these machines at hetmathsys1.scrtp.warwick.ac.uk
. If not I will see if I can reproduce on a standalone Linux machine that I could give you access to.
So after re-running in python again I'm still getting some variation with super cell size (although it's far less significant than previously, but still indicating something is wrong). for orthorhombic cells It still appears to be fine (it's the same value across supercells to machine precision).
Reproduced on hetmathsys1
with similar sensitivity with a clean environment and ACEpotentials v0.6.7. Trying on another machine with Julia 1.10.2.
10-element Vector{Float64}:
-19550.32896019295
-19550.328960192986
-19550.328960193
-19550.32896019301
-18629.440850654497
-19550.328960192863
-19550.328960192768
-19550.328960192823
-19550.328960193307
-19286.572770616513
Upgrading to Julia v1.10.4 on hetmathsys1
seems to remove the instability. Can you confirm this @ConnorSA? I saw the same on my other machine. No clue why this would be, however.
10-element Vector{Float64}:
-6380.619862204434
-6380.619862204448
-6380.619862204453
-6380.6198622044385
-6380.619862204455
-6380.619862204479
-6380.619862204406
-6380.619862204358
-6380.61986220433
-6380.619862204427
I can't reproduce it with 1.9 either:
(base) ortner@CosmicCrisp bug_connor % j19 --project=. reproduce1.jl
10-element Vector{Float64}:
-1591.4878094100202
-1591.48780941002
-1591.48780941002
-1591.4878094100204
-1591.487809410019
-1591.4878094100177
-1591.4878094100166
-1591.4878094100159
-1591.4878094100159
-1591.4878094100159
So the big question is what do we do if we can remove the problem by updating everything. Do we still try to track down the bug? Or do we just post a warning and set new version bounds for everything?
Reverting to Julia 1.9.4 on my other Linux machine also brings back the instability, so I think there must have been a change somewhere, perhaps in the linear algebra libraries? Might be a macOS vs Linux thing.
I'd be OK with adding a regression test to check this doesn't come back and setting version bounds of Julia 1.10 onwards.
10-element Vector{Float64}:
-6832.340515552775
-6832.3405155528
-6832.340515552792
-6832.3405155528
-6592.010255233377
-6832.340515552779
-6832.340515552786
-6832.340515552833
-6832.340515552818
-6756.199673696969
julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
Threads: 1 on 32 virtual cores
Environment:
JULIA_IMAGE_THREADS = 1
I've just updated to the latest julia (fresh install) and checked in the julia env and get this:
-9998.238035635983,
-9998.238035635995,
-9998.238035635984,
-9998.238035636,
-9998.238035636003,
-9998.238035636034,
-9998.23803563606,
-9998.238035636054,
-9998.238035635795,
-9998.238035635635
I'd be OK with adding a regression test to check this doesn't come back and setting version bounds of Julia 1.10 onwards.
Ok, this makes sense. I will first try to add a test that fails on julia 1.9 in CI but passes on 1.10. And then we add the version bounds.
@ConnorSA -- for now does this solve your problem?
Yes for now using the most recent versions of Julia with fresh packages etc have manged to resolve this bug.
This might be related, but another colleague of mine had some potentially similar issue, however, this was on the side of the potential fitting itself. I can't remember the specifics but it had something to do with using a different version of a crystal cell (i.e. going from an orthorhombic description to some cell that had acute angles - same cell symmetry) and this resulted in some terrible fits when that particular geometry was included in the training data. I will ask him to try and see if he can recreate that scenario but with updated versions of stuff :)
Thanks again for all your help!
Connor
That sounds more like a neighbourlist issue. But either way please ask to file an issue.
CLosed by #213 and #214
Hey all,
I noticed some strange behaviour when using ACE potentials I developed and wondered if this could be a potential bug. In particular, I found that the energy per atom would explode sometimes when increasing super cell size when atoms sat on the lattice sites. Here is a quick python snippet of the test:
which would output the following atomic energies:
If I rattle the atomic positions by a tiny amount (0.00001 \AA), then I recover something more in line with what I expected in terms of atomic energy per atom at large super cell sizes (for reference this is -1593.72718054 eV from DFT). I think something strange is happening numerically when atoms happen to sit in particular configurations, wondered if this has been seen before and if it's reproducible? I'm not sure how consequential this might be, but wanted to throw this out there :)
I have managed to get this effect with different ACE potential complexities, but it seems to be inconsistently appearing in particular crystal symmetries (I haven't had this effect on an orthorhombic box yet).
As far as ACE parameters go, I've managed to get this behaviour with different orders and degrees in the model complexity. In this case this is with a order=3, total_degree=15, and cutoff of 6.0 \AA.
julia: ACEpotentials v0.6.2
Many thanks,
Connor University of Warwick