Closed cortner closed 6 years ago
@cortner I wasn't able to reproduce this. Which version of AMG were you on? v0.1.0?
With v0.1.0 (and on master), I get:
julia> import JLD, AMG
julia> A, f = JLD.load("AMG_example.jld", "A", "f");
julia> x = AMG.solve(AMG.ruge_stuben(A), f, 1000, AMG.V(), 1e-10);
600-element Array{Float64,1}:
-0.0251331
-0.00494112
0.0162619
-0.0011254
0.00795969
-0.0150707
-0.0141143
-0.00532021
-0.0177023
0.0181578
-0.0127625
-0.000450548
-0.0145081
0.0260964
-0.0107832
0.00765703
-0.0163651
-0.0220383
0.0284923
⋮
-0.010587
0.0188101
0.0012849
-0.00156493
-0.0276459
-0.00969683
-0.0063156
0.00978282
-0.015404
0.0230772
-0.0296537
-0.0207526
0.0152367
-0.0114013
-0.0131285
-0.0156694
0.0195659
0.0140342
0.00516803
julia> @show vecnorm(A \ f - x, Inf)
vecnorm(A \ f - x, Inf) = 2.2245239938295525e-11
2.2245239938295525e-11
julia> @show vecnorm(A * x - f, Inf)
vecnorm(A * x - f, Inf) = 8.179806831876135e-11
8.179806831876135e-11
julia> @show any(isnan.(A\f))
any(isnan.(A \ f)) = false
false
julia> @show vecnorm(A - A', Inf)
vecnorm(A - A', Inf) = 0.0
0.0
julia> @show minimum(eigvals(Symmetric(full(A))))
minimum(eigvals(Symmetric(full(A)))) = 0.009999999999994012
0.009999999999994012
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin17.3.0)
CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT NO_AFFINITY NEHALEM)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> Pkg.status("AMG")
- AMG 0.1.0 master
I tested that it happens on a fresh restart of Julia
Same happens with
x = AMG.solve(AMG.smoothed_aggregation(A), f, 1000, AMG.V(), 1e-10)
I'm on
julia> Pkg.status("AMG")
- AMG 0.1.0
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
And with smoothed_aggregation it's giving:
julia> @show vecnorm(A \ f - x, Inf)
vecnorm(A \ f - x, Inf) = 2.0854013828286444e-11
2.0854013828286444e-11
julia> @show vecnorm(A * x - f, Inf)
vecnorm(A * x - f, Inf) = 7.806648383290593e-11
7.806648383290593e-11
julia> @show any(isnan.(A\f))
any(isnan.(A \ f)) = false
false
julia> @show vecnorm(A - A', Inf)
vecnorm(A - A', Inf) = 0.0
0.0
julia> @show minimum(eigvals(Symmetric(full(A))))
minimum(eigvals(Symmetric(full(A)))) = 0.009999999999994012
0.009999999999994012
I wonder if it's to do with the Haswell
vs NEHALEM
on our stock openblas builds
would it be helpful if I download Julia rather than build locally and test there?
Yes, could you try it on a downloaded version of Julia too? I just tried on an older machine with this:
julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e2ff (2017-10-24 22:15 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E7- 8850 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, westmere)
and wasn't able to reproduce either.
julia> versioninfo()
Julia Version 0.6.1
Commit 0d7248e (2017-10-24 22:15 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> norm(AMG.solve(AMG.ruge_stuben(A), f, 1000, AMG.V(), 1e-10), Inf)
Inf
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
julia> norm(AMG.solve(AMG.ruge_stuben(A), f, 1000, AMG.V(), 1e-10), Inf)
Inf
Hmm. Very strange. Let me try finding hardware similar to yours so I can reproduce this and fix. Would it be possible for you to test on any other hardware you have available and see if it works?
not right away, but yes, I'll do that.
There's always Juliabox :-)
Yeah - I have to say I’ve been less than thrilled with it :(. Not to worry, I’ll send you more tests.
Sent from my iPhone
On 8 Mar 2018, at 10:41, Ranjan Anantharaman notifications@github.com<mailto:notifications@github.com> wrote:
There's always Juliabox :-)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ranjanan/AMG.jl/issues/32#issuecomment-371450361, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHVRQ-hhVWE7JnPJpCMAbPzZVVuFupXjks5tcQregaJpZM4ShuTn.
next test: this is my late 2017 MacBook Pro
julia> norm(solve(AMG.ruge_stuben(A), f; maxiter=1_000), Inf)
Inf
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin17.3.0)
CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT NO_AFFINITY NEHALEM)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
julia> Pkg.status("AMG")
- AMG 0.1.0+ master
my office workstation - a ca 4 year old Linux machine, running Ubuntu:
julia> norm(AMG.solve(ruge_stuben(A), f; maxiter = 1_000), Inf)
0.03725863138068789
julia> versioninfo()
Julia Version 0.6.3-pre.0
Commit 93168a6 (2017-12-18 07:11 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
The only other difference here is that I use a development version of Julia v0.6.2. If you think that can make a difference, I can experiment with that.
So it works on something. Good, that's a start.
Can you run x, res = solve(AMG.ruge_stuben(A), f; maxiter=1_000, log = true)
and paste res
here please? On both the config where it works and where it doesn't.
On Apple Laptop:
julia> x, res = AMG.solve(AMG.ruge_stuben(A), f; maxiter=1_000, log = true)
([8.07392e218, 1.16689e238, Inf, 8.22031e218, 1.18019e238, Inf, 8.10053e218, 1.16562e238, Inf, 8.17945e218 … Inf, 8.00242e218, 1.16566e238, Inf, 8.1132e218, 1.14831e238, Inf, 7.85136e218, 1.16336e238, Inf], [6.32894, 2.71108, 7.21397, 8.29327, 53.8207, 132.188, 430.102, 414.07, 2863.47, 59123.7 … 1.66233e302, 1.47662e302, 3.11071e303, 5.58143e303, 1.09626e305, 2.54711e305, 4.30591e306, 8.33508e307, 1.08269e308, NaN])
Linux workstation:
julia> x, res = AMG.solve(AMG.ruge_stuben(A), f; maxiter=1_000, log = true)
([-0.0251337, -0.0049412, 0.0162617, -0.00112588, 0.0079593, -0.0150709, -0.0141149, -0.00532007, -0.0177025, 0.0181574 … 0.0230771, -0.0296543, -0.0207528, 0.0152366, -0.011402, -0.0131283, -0.0156696, 0.0195651, 0.0140341, 0.00516779], [6.32894, 0.0123846, 0.00049327, 2.80065e-5])
I just checked the same test on an equivalent laptop of one of my postdocs and there is no problem there. So there must be something specific about my system. Any ideas?
EDIT: she is running Julia v0.6.1
Well, the desired residuals show up on your Linux machine.
[6.32894, 0.0123846, 0.00049327, 2.80065e-5]
But on your mac, it blows up on the second iteration itself:
[6.32894, 2.71108, 7.21397, 8.29327, 53.8207.....]
.
I think the best way forward is making a branch with useful debug statements and to track the execution and see where exactly things are going wrong.
How come travis testing is off for OS X? I know the standard script doesn't work; I can probably fix it though - I recently did the same for all my packages.
This is getting weirder - I can reproduce the "bug" (for lack of a better word) on a clean Julia v0.6.1 on an identical machine as the one my postdoc has where my test passes ok. I'd love to know what is going on here?
Mostly for convenience actually. Travis sometimes took a long time to boot an OSX VM. I don't think that's an issue. I develop on a Mac too, and it works fine for me.
Can you run:
Pkg.checkout("AMG", "debug")
and paste the output of both the following commands:
ruge_stuben(A)
and
norm(AMG.solve(ruge_stuben(A), f; maxiter = 1_000), Inf)
please?
The output maybe a little long on the second one, but could you show me anyway? I'm just trying to trace where the deviation first begins. It shouldn't really happen because all the sparse matrix matrix multiplies and the sparse matrix vector multiples are in pure julia.
I'll have a go at that, and then we can consider your suggestion from your email.
To add insult to injury: I've created a second user account on my laptop, used the Julia 0.6.2 downloaded from the julialang website, installed only JLD and AMG, and reran my test => it passed. So this means even on my own system it fails / passes depending on the user.
Is similar
used somewhere? If so, my guess is this might be non-deterministic due to left-over "junk". You may want to change those to zeros
and see if it still happens.
Chris - thanks for the suggestion, unfortunately this didn't fix it.
julia> Pkg.status("AMG")
- AMG 0.1.0+ debug
julia> import JLD, AMG; A, f = JLD.load("AMG_example.jld", "A", "f");
julia> AMG.ruge_stuben(A)
Multilevel Solver
-----------------
Operator Complexity: 1.096
Grid Complexity: 1.173
No. of Levels: 4
Coarse Solver: AMG.Pinv()
Level Unknowns NonZeros
----- -------- --------
1 600 27600 [91.23%]
2 90 2610 [ 8.63%]
3 11 41 [ 0.14%]
4 3 3 [ 0.01%]
julia> norm(AMG.solve(AMG.ruge_stuben(A), f; maxiter = 1_000), Inf)
lvl = 1
norm(res) = 0.16982460303747857
lvl = 2
norm(res) = 0.015541058197269964
lvl = 3
norm(res) = 0.0007631747757248453
lvl = 1
norm(res) = 0.004487746791850987
lvl = 2
norm(res) = 0.000734711679690242
lvl = 3
norm(res) = 2.081855908096134e-5
lvl = 1
norm(res) = 0.00024118065143577877
lvl = 2
norm(res) = 3.861911660869666e-5
lvl = 3
norm(res) = 1.2664485637809385e-6
0.03725863138068789
I want to swear right now . . .
just to make sure:
julia> Pkg.status("AMG")
- AMG 0.1.0+ master
julia> import JLD, AMG; A, f = JLD.load("AMG_example.jld", "A", "f");
INFO: Recompiling stale cache file /Users/ortner/.julia/lib/v0.6/AMG.ji for module AMG.
INFO: Initializing AMG to use 4 threads
julia> AMG.ruge_stuben(A)
\^R
Multilevel Solver
-----------------
Operator Complexity: 1.096
Grid Complexity: 1.173
No. of Levels: 4
Coarse Solver: AMG.Pinv()
Level Unknowns NonZeros
----- -------- --------
1 600 27600 [91.23%]
2 90 2610 [ 8.63%]
3 11 41 [ 0.14%]
4 3 3 [ 0.01%]
julia> norm(AMG.solve(AMG.ruge_stuben(A), f; maxiter = 1_000), Inf)
Inf
I've had similar problems in the past, when adding some completely unrelated code or print statements somehow fixed the problem. Suggests that Chris is right about the cause of the problem?
Okay so can you unset JULIA_NUM_THREADS
and then run it again on master please? That threading code was definitely buggy, and commented it out on the "debug" branch. Were you always using threads in your code base?
I'll do that in a second. In the meantime:
debug
branch with a clean Julia => no bug master
branch, then switch to debug
and reload
=> then I see the bug again.looks like that was it:
julia> import JLD, AMG; A, f = JLD.load("AMG_example.jld", "A", "f");
INFO: Recompiling stale cache file /Users/ortner/.julia/lib/v0.6/AMG.ji for module AMG.
julia> reload("AMG"); norm(AMG.solve(AMG.ruge_stuben(A), f; maxiter = 1_000), Inf)
WARNING: Method definition length(Any) in module AMG at /Users/ortner/.julia/v0.6/AMG/src/multilevel.jl:21 overwritten in module AMG at /Users/ortner/.julia/v0.6/AMG/src/multilevel.jl:21.
WARNING: replacing module AMG.
0.03725863138068789
so it's the sparse matrix-vector multiplication which you multi-threaded? I'm glad you found this. If you still need access to a machine where the bug can be reproduced let me know. I'll need to think about how I'll go about this though.
Were you always using threads in your code base?
do you mean in my own codes? No, I haven't yet started multi-threading them.
Ah, I see. I had no idea you were running with threads.
so it's the sparse matrix-vector multiplication which you multi-threaded?
Yes, I did. It was some experimental code I put in there to try something out, and I didn't document it on the README. Looks like I should just remove it until I get it right because it's too much trouble.
So this was actually my setting? That would also explain why it only fails on my configuration. I didn’t even realise I had it. Sorry about that.
No problem, happy to help. 👍 I hope you could use the package single-threaded for now. Do let me know if you face any other problems.
thank you!
I've tried to incorporate AMG into one of my codes, where -- unfortunately -- it fails quite spectacularly:
generates the output
The file needed to reproduce this can be obtained from HERE
The matrix A here is essentially H + \epsilon I. where \epsilon ~ 0.01, and H can be thought of as a weighted graph-laplacian of a two-dimensional network.