Closed cortner closed 2 years ago
Update ... I'm running into a horrible issue:
It seems that somehow my optimizations work really well but only on my M1 processor?
performance on 1.6 is now fixed on my laptop, but still horrendous on our server. This very strange...
@andresrossb If you have a free moment, would you be willing to pull this branch and run the `profile/profile_basis.jl' on your laptop? @zhanglw0521 as well?
Ignore this, I messed up my test. The performance is now comparable on the server and on my M1.
(but seriously, the M1pro is a beast - I get a factor 3 faster performance for the gradients than on the EPYC, which isn't exactly a slouch either...)
(but seriously, the M1pro is a beast - I get a factor 3 faster performance for the gradients than on the EPYC, which isn't exactly a slouch either...)
Sounds really really attractive...
yes, but you'll never get enough M1pro cores to make it a serious contender....
Looks like PkgBenchmark is worth integrating into our workflow... judge.pdf
this has significant improvements on basis evaluation, so I'll merge and tag before we move on to LinearACEModel.
first steps towards making ACE2 performant. There are a few cleanups, starting to put together some benchmarks, but the main contributions here are
State
andDState
implementations to get around a Julia 1.7 bugThe 1p basis evaluation now seems reasonably performant. But unfortunately there is a segfault left that I cannot track down yet. It occurs only with Julia 1.7, not with 1.6, when testing the B1pMultiplier:
and the rest is not so interesting. It seems reproducible.