Closed SRHudson closed 7 years ago
You can try this to create a new branch inherited from your current branch (type git branch
to see which branch you are)
git checkout -b ma00aa
And once you want to push this to GitHub, just try
git push origin ma00aa
Or you can directly merge it into master/NAG_REPLACE, by
git checkout master
git merge ma00aa
I would suggest push then wait for feedback. Once the community is happy with it we can merge the branch into master through github. @jloizu and I just did this for his modifications to the Makefile for IPP. I think it's safe to say that in a branch you can push whatever you want because unless someone else is working with that branch it doesn't affect anyone. Once we push to master it's a global change.
@SRHudson, you should really dig up your Princeton information, the Lynda.com tutorial on git is excellent.
Anyone can then checkout and pull this new branch.
I recommend this very well done and fast-learning videos: TUTORIAL PART 1: https://www.youtube.com/watch?v=HVsySz-h9r4 TUTORIAL PART 2: https://www.youtube.com/watch?v=FdZecVxzJbk
Dear Speculators,
I have pushed some changes to my branch that I called ma00aa.
I copied what Sam did regarding removing the division in favor of multiplications, and I made some other almost trivial changes that I guess will reduce the computation. For example, the values of cheby are stored in global memory (see TD in global and preset) rather than being calculated, the values of ii and jj are stored in ilabel and jlabel (see global and preset), and the values of ll and pp are stored in llabel and plabel (see global and preset). I also removed the multiplication by pi2pi2nfphalf and placed this multiplication into matrix, where I think it will be a bit faster.
I will leave it to someone else to check the timing etc.
Using TD, ilabel, jlabel, llabel and plabel will reduce the computation, but it also requires accessing memory. I don't know which will be faster. It may be the case that the time improvements that I think I have achieved are insignificant to the time improvements that had already been achieved by Sam.
Also, and perhaps most significantly, this routine does not seem to exploit stellarator symmetry! After someone has checked that I am on the right track (in terms of correctly using git) and has performed a timing comparison, it will be easy enough for me to exploit stellarator symmetry. This should speed ma00aa.h up considerably.
Dear Spectaculars,
I have exploited stellarator symmetry in ma00aa, and have tested on l2stell in TESTCASES (by the way I REALLY HATE ALL CAPS, Why not TestCases, or preferably Testcases. I usually identify subdirectories by capitalizing the first letter and only the first letter. Anyway, let's move on). Exploiting the stellarator symmetry reduces the computation time considerably.
I hope that someone checks all this. I have placed my edits in branch ma00aa.
@SRHudson Looks good to me. The ma00aa routine reports back an execution time of ~15s which is a huge improvement. I'm going to go ahead and close this issue and issue a pull request.
That sounds very promising and exciting! I will perform thorough tests today, including profiling. I want to test speedup (compared to before), proportion of cpu time consumed by subroutines (compared to before), and also I want to compare the results of running the l2stell test case with the "slow" and "fast" versions, in order to see differences.
Hi,
I plan to spend some time revising ma00aa, which you can see from http://w3.pppl.gov/~shudson/Spec/ma00aa.pdf computes the volume integrals of the metric elements.
Sam has already advised on this, suggesting to remove divisions from within loops and to remove loops. I will also look at exploiting symmetries more efficiently. Restructuring the loops should be easy enough, but the symmetries will take longer to be fully optimized.
As I am still getting used to git, please give me some suggestions. Should I create a new branch? If so, how?