Closed oscarxblanco closed 3 months ago
Hello @oscarxblanco very interesting development, this was in fact on my todo list to allow LT calculation with GPU from @JeanLucPons.
To be noted: in his improved implement of at.c @JeanLucPons has introduced tracking from a starting refpts, but since this is not yet distributed your solution is quite nice.
First comment: could you separate functionalities? Especially the initial tracking to move all particle to a single refpts is extremely useful. If it could be use separately we could use its output to feed the functions already available in acceptance.
Citing the work done by @ZeusMarti here #92.
Hello,
As said by Simon, having the way to having all points "projected" (tracked) up to the end of the turn is a nice idea and may be integrated in C code.
More details on GPU code: For the GPU code, starting refs are handled by doing modulos on particle indices at the tracking level, the overhead is clearly not visible. However to get best gpu performance there are some constraints to fill GPU cores usage that you don't have on CPUs. There no are no re-arrangement of particles when some of them are lost and an extra modulo is needed to get back refpts of elements where particle are lost (if needed). The re-arrangement of stride is a tricky task not yet done and can bring benefit only when the number of stride exceed number of GPU cores (I mean actual GPU cores, not CUDA cores). At the moment, a GPU stride is automatically freed when all its particles are lost.
" tracking_starts: numpy array of indices of elements where tracking should start.\n"
" len(tracking_start) must divide the number of particle and it gives the stride size.\n"
" The i-th particle of rin starts at elem tracking_start[i/stride].\n"
" The behavior is similar to lattice.rotate(tracking_starts[i/stride]).\n"
" The stride size should be multiple of 64 for best performance.\n"
Dear @JeanLucPons in my opinion particle losses is a must also in the GPU version.
I worked yesterday on the tracking of particles from multiple points to a single reference point. It seems to work, but, I realized that there are two cases: 1) The information of the starting reference point need to be preserved. 2) The information of the starting reference point may be removed. I've been thinking on how to cover both in a simple way, but, at the moment I would need to create a flag called 'group' to reduce case 1) to case 2).
I will push later today a new version when I feel satisfied by the tests.
The version I'm writing works on python. As you said, it could be implemented in directly in C but right now I don't see the advantage for a CPU because it is still a process where you need to iterate over the reference points either in C or python. I might be missing some memory allocation control where C could be of advantage.
@oscarxblanco moving all the particles to a single point allows to track them all together in a single ring.track() call, no iteration is needed after that. This is the ideal case for parallelization whether it is GPU/CPU python or C.
A simple solution concerning your 2 cases is to track the same number of particles per refpts, it is then easy to split the results at the end of your tracking
@oscarxblanco moving all the particles to a single point allows to track them all together in a single ring.track() call, no iteration is needed after that. This is the ideal case for parallelization whether it is GPU/CPU python or C.
I agree, all comments I did to @JeanLucPons above refer to the tracking of multiple reference points to a single reference point in a function I called projectrefpts
which is still under debugging.
A simple solution concerning your 2 cases is to track the same number of particles per refpts, it is then easy to split the results at the end of your tracking
Right, I think the splitting or grouping should be provided inside the function because it prevents user mistakes during the reshaping.
It would be nice to have the same interface for both the CPU and GPU tracking functions. If you have the startrefpts vector then you can call projectrefpts() and then track as before. Then I can also remove the modulo in the GPU tracking code. Adding the limitation that Simon suggests is nice because it also helps to cover the GPU constraints. It will just return a survived turn map with a +/- 1 turn offset (don't think it is a big issue) and lost_at_element indices stay the same as in the input ring which is intuitive. This is my humble opinion but i may missed something.
It would be nice to have the same interface for both the CPU and GPU tracking functions.
Dear @JeanLucPons , do you mean the whole momentum aperture function? or the projectrefpts
function ?
If you have the startrefpts vector then you can call projectrefpts() and then track as before. Then I can also remove the modulo in the GPU tracking code.
It seems you refer to projectrefpts. By vector do you mean to enforce a 1D-numpy array type ? I just tested and it works with a list or a 1D-numpy array. Should I change something ?
Adding the limitation that Simon suggests is nice because it also helps to cover the GPU constraints.
I think again you refer to have a separate function to track multiple reference points to an end point, right?
In that case I agree on the advantages, although, I see no limitation. It is implemented in projectrefpts
Or perhaps you referred to @swhite2401 's proposal to limit the function to track the same number of particles per reference point ? If that is the case, it is already implemented.
It will just return a survived turn map with a +/- 1 turn offset (don't think it is a big issue) and lost_at_element indices stay the same as in the input ring which is intuitive. This is my humble opinion but i may missed something.
Thank you @JeanLucPons . So if I understand well your request is already implemented ! You could try it like
refpts = ring.uint32_refpts(at.All)
outparts, lostmask = at.momap_alternative.projectrefpts(ring, refpts, 'group'=True)
aliveparts = outparts[:,~lostmask,0,0]
Dear @swhite2401 ,
I'm having troubles with python3.7 and flake8.
The code inside momap_alternative.py
was passed though black and it runs well in my pc with python 3.11.2.
From the test output I think flake8 version 5 was installed. I downgraded flake8 but I can not get the error reported above.
Do you have any suggestion on what should I test ?
No idea...@lfarv do you know what could be the issue?
@oscarxblanco : the problem seems to be with python 3.7, and I get the following errors:
"Python version 3.7 does not support equality signs in f-strings" in momap_alternative_py, lines 162 and 226
Can you try after correcting that?
Great, that solved the building issue. Thank you @lfarv
@oscarxblanco:
By the way, in your new functions, you declare "ring" as a list
. It should rather be declared as Lattice
, since you are using may Lattice
methods (ring.uint32_refpts
, ring.radiation_parameters
, ring.find_orbit
…).
@lfarv , noted. It will be solved in the next push.
It would be nice to have the same interface for both the CPU and GPU tracking functions.
Dear @JeanLucPons , do you mean the whole momentum aperture function? or the
projectrefpts
function ?
Dear @oscarxblanco
Yes I speak about the projectrefpts
function.
IMHO it would be nice to have this feature available directly in the Lattice.track
function allowing parallel tracking of particles starting at different refpts independently of the type of calculation you want to make.
if startrefpts
is not set nothing change.
Dear @JeanLucPons,
in ring.track
the particle input is (6,N) for N particles, while projectrefpts
is (6,N,R,1) to project N particles per reference point from R reference points. These are incompatible.
I could simplify even more projectrefpts
to track only one particle from every reference point, therefore, the input will be (6,R), is that ok ?
If I understood correctly, you want this code to be inside lattice_track
, right ?
I think at this point projectrefpts
is not longer needed for momentum aperture but could be put in the track.py, Do you agree ? This will provide a multi-particle and multi-reference point tracking to an end-point on the lattice.
Dear @JeanLucPons ,
a second possibility would be to have (6,NxR) particles and do modulo arithmetic when the 'startrefpts' argument is passed to lattice_track
.
If you want to make CPU and GPU compatible, what is more convenient inside lattice_track
? (6,R) and one particle per reference point ? or (6,NxR) and do modulos ?
Yes I think it could be a good idea to move projectrefpts
to track.py.
I don't think that modulos are needed (I do this at the moment but your solution is simpler)
The idea is:
Tracking using one pre-projectrefpts
and one post-projectrefpts
calls:
startingElem = np.array([0,100,200,300,400,500,600,700],dtype=np.uint32)
rout, *_ = ring.track(rin, nturns=nturns, startrefpts=startingElem)
Limitations:
startingElem
(R) should divide the total number of particle in rin
, it gives N.Standard tracking (all particle start at first element):
rout, *_ = ring.track(rin, nturns=nturns)
Potential issue: when several observation points are given, it may exists "hole" that should be filled but how ?
Hello @oscarxblanco , @JeanLucPons ,
Maybe it is needed to take a little step back back, the PR seems to introduce 2 new features, please correct me if I misunderstood something: -a dicotomic search of the chaotic boundary. This method is faster but cannot profit from parallelization and is known to overestimate the aperture which is fine for me but has to be clearly documented. -for the lifetime calculation it proposes to pre-track particles to projects them at a single observation point. This is interesting to allow for LT calculation with a single call to the C tracking engine and simplifies CPU/GPU calculations. This is nevertheless only useful for lifetime calculation, at least I cannot think of other applications....
This leads to several questions: -Should we integrate the dicotomic boundary search in the existing aperture module as another grid method? -Should we integrate something that is only used for LT calculations in the general tracking function?
For info, I had in mind to completely refactor the aperture sub-package to optimize it for GPU. This includes the multi-refpts mode but also multi-grid for a single refpts (tracking X-Y and X-DP DA in on shot for example). This is a significant development and unfortunately I have had not time to start it for the moment....
My proposal would then be to keep this PR independent in a separate module for the moment as initially proposed, to allow to quickly merge it and then at a later stage fully integrate it in existing modules when we refactor the aperture sub-package.
What do you think?
@swhite2401 ok, understood. I will move back to the state before modifying the tracking functions.
Apart from that I have a comment,
-a dicotomic search of the chaotic boundary. This method is faster but cannot profit from parallelization and is known to overestimate the aperture which is fine for me but has to be clearly documented.
It is parallelized in this PR.
1) All particles start with a test_energy et
which is a guessed value equal to half the bucket height.
2) Then, every reference point is tracked to the end of the ring.
3) Particles lost up to this point are removed from tracking.
4) Alive particles are filtered: they pass the filter if their coordinates are different by more than $\epsilon$ in the 6D space.
5) Filtered particles are grouped and tracked together in parallel.
6) Depending on the survival or death of the particles at the test energy et
, the stable energy es
and the unstable energy eu
at every reference point are redefined.
7) We redefine the et
as the mean between stable and unstable.
8) We loop from 2). The loop stops when the difference is smaller than some tolerance.
You are right about the overestimation of the algorithm.
Sure but since you need to iterate I can't be fully parallelized as opposed to the case where you just send a vector of particles that you know contains the boundary which I think in the end will always be more efficient providing you have sufficient number of CPU/GPU.
Dear @swhite2401 and @JeanLucPons
This is ready to review, who should be doing it ?
On my side I have to note that the matlab and python functions have different names.
In AT matlab : MomAperture_Project2Start
In pyat : momaperture_project2start
Let me know if you prefer to change the names.
Here is a piece of code that could be use to run the test.
"""
This script tests the bipartition algorith developed by Z. Marti
2024may08 oblanco
"""
import at
import numpy
import time
print(f'at.__version__ = {at.__version__}')
refpts = numpy.array(range(100))
ring = at.load_mat('THERING2024may16.mat')
nturns = 1000
spos = ring.get_s_pos(refpts)
oo, orbit = ring.find_orbit(refpts)
start_time = time.time()
dnp = at.momap_alternative.momaperture_project2start( ring,
refpts=refpts,
nturns=nturns,
dptol=1e-3,
use_mp=True,
verbose=True)
print("--- %s seconds ---" % (time.time() - start_time))
boundary = dnp
And here is the matlab test code. I add one to positions
to skip the Ring Parameters element, this in order to calculate the momentum aperture on the same reference points in both, matlab and python.
%%
positions = [1:100]+1;
Nturns = 1000;
tic
[ETN, ETP]=MomAperture_Projecect2Start(THERING,positions,Nturns,1e-3,[],[1e-5 1e-5],true);
toc
Ok thanks @oscarxblanco! I can review it, is this urgent?
Hello @oscarxblanco , @swhite2401,
-Should we integrate something that is only used for LT calculations in the general tracking function?
What do you think?
For me the main objective is to have a compatible low level tracking function for GPU/CPU and Matlab/Python.
Having the projectrefpts
related stuff inside the acceptance module is not an issue for me but I need to know in order to remove modulos in the GPU code.
Ok thanks @oscarxblanco! I can review it, is this urgent?
No, it could be low priority.
For completeness, I leave here a plot comparing the momentum aperture result on the positive energy side on 100 points of a ring.
The yellow and blue colors show the dead and alive particle zones after 1000 turns when tracking a swarm of particles with a small energy step occupying all the energy range on each reference point. The light blue curve is obtained with AT matlab MomAperture_allRing
modified to take into account the closed orbit. The red curve uses the new function, called for the moment, MomAperture_Project2Start
. As you see, different variations of the energy step produce different results that could change the Touschek scattering results by some amount, typically a small amount.
Here below is a figure showing a small difference between momaperture_project2start
in pyat, and MomAperture_Project2Start
in AT matlab. The small differences between the two results seems to come from low-level libraries used by python and matlab, see for example discussion #772 . This does not pose a problem because this algorithm is meant to be a faster but less precise result with respect to other algorithms with finer energy step.
Dear @swhite2401 , could you check the matlab version ? it has been modified to track positive and negative energy thresholds in the same iteration. And the $\epsilon$ is now an option. Here is an example, with $\epsilon=10^{-9}$. NOTE: I have not yet used any parallelization function in matlab. I believe this depends on a toolbox installation and having the license to use it. Should I added any test about it?
tic
[ETN, ETP] = MomAperture_Project2Start(THERING, ...
'refpts' , positions, ...
'nturns', Nturns, ...
'detole', 1e-3, ...
'euguess', [], ...
'troffset', [1e-5 1e-5], ...
'verbose', true, ...
'epsilon6D', 1e-9);
toc
Dear @swhite2401 , I have modified the python and matlab versions of the bipartition algorithm to find the momentum aperture, tracking for multiple turns both positive and negative test energies on a single call on every energy step.
The criteria to track only particles differing by some $\epsilon$ is now an option, and not the default behavior.
Wrt to the parallelization in matlab, it is not implemented. It seems to require the Parallel Computing Toolbox, and at least a test with ``tf = canUseParallelPool(). Should I add it ?
Wrt to parallelization in python, all messages in the doc have been removed. The calculation is done in serial, but one could still set use_mp=True
through keyword arguments that are passed to at.lattice_track
.
I leave here a call example in python and in matlab. The matlab arguments are not longer positional.
start_time = time.time()
dnp = at.momap_alternative.momaperture_project2start( ring,
refpts=refpts,
nturns=nturns,
dptol=1e-3,
use_mp=True,
epsilon6d=1e-9,
verbose=True)
print("--- %s seconds ---" % (time.time() - start_time))
tic
[ETN, ETP]=MomAperture_Project2Start(THERING, ...
'refpts' , positions, ...
'nturns', Nturns, ...
'dptol', 1e-3, ...
'euguess',[], ...
'troffset', [1e-5 1e-5], ...
'verbose', true,...
'epsilon6D', 1e-9);
toc
@oscarxblanco: Sorry to intervene again… Still 2 details: I get 2 warnings:
PEP 8: E302 expected 2 blank lines, found 1: 18 Local variable 'zin' value is not used: 245
@oscarxblanco: Sorry to intervene again… Still 2 details: I get 2 warnings:
PEP 8: E302 expected 2 blank lines, found 1: 18 Local variable 'zin' value is not used: 245
Dear @lfarv , all comments are welcome. I still don't know why flake8 missed them. I did the changes even if I don't see those two warnings.
@@ -15,6 +15,7 @@ __all__ = ["momaperture_project2start", "projectrefpts"]
# Based on the MATLAB implementation by Z.Marti at ALBA
# See https://github.com/atcollab/at/pull/773
+
def momaperture_project2start(ring: Lattice, **kwargs: Dict[str, any]) -> numpy.ndarray:
"""
:py:func:`momap_project2start` calculates the local momemtum aperture.
@@ -242,7 +243,6 @@ def projectrefpts(
if nparticles == 1:
use_mp = False
- zin = numpy.zeros((6, nparticles, nrps, 1))
if groupparts:
Dear @lfarv ,
the doc string in python and matlab has been modified to say momentum, not energy. Also the parameter euguess
is now dpuguess
See the last commit.
All good for me
This is a work in progress proposal to calculate the momentum aperture of a ring where particles from all reference points are tracked at the same time, while the energy offset step is reduced on every iteration.
It is expected to be faster due to the possibility of tracking all particles at once, and the reduced number of iterations on the stable energy boundary search.
It differs from the acceptance method where every refpts is tracked through separated call to tracking functions.