New interface to tracking functions

swhite2401 commented 1 year ago

This PR proposes to refactor the pyAT tracking functions, presently we have

element_pass to track through a single element
lattice_pass to track through a sequence of elements
patpass to track through a a sequence of elements using python multiprocessing

All these functions are used in internal pyAT basic functionalities such as orbit search or optics calculations.This has several drawbacks:

the user is provided with several entry points with different interfaces
the interface and performance have to be maintained for internal AT functions preventing any developments or modifications (see #568 for example)

This PR proposes to refactor the tracking function as follows:

a single entry point is provided to the users for all tracking functionalities (element, lattice, parallel)
the interface is modified to provide flexibility for future developments without affecting existing code
protected function are used for pyAT internal use to remove inter-dependency from the user interface

All existing functions remain valid and usable as before but are moved to the deprecated module: they will therefore not be update with future developments

Presently a minimal set of output is provided (some may not be so useful while others could be added) as an example. With the new implementation it is however possible to add outputs or post processing steps without changing the interface or affecting other pyAT functions.

`

lfarv commented 1 year ago

@swhite2401 At first glance this looks nice. It will need a more careful investigation, but before that I have a few preliminary remarks:

I think one should add @MJGaughran to the reviewers,
I would not modify the existing tests because it's the way to be sure that the old functions still work. Or at least keep the most critical, if adding the tests of the new interface make the sequence too heavy,
Maybe one could make a minor release after the pending bug fixes are merged, but before this one. This one could be be first of the 1.x series.

swhite2401 commented 1 year ago

Thanks @lfarv for the feedback, concerning test this was a bit of a dilemma: I do not want to integrate deprecated functions in the test (see discussion in #601). For me it makes no sense, so test should be modified to call 'active' functions, that are in any case called by the old ones...

Then I hesitated between the new interface and the internal_* functions, either one is fine for me. I had the intention to add the test to check that track_function and internal_* give identical results in any case to make sure all functions are tested.

I agree about the release, this also gives time to review this one properly.

lfarv commented 1 year ago

I do not want to integrate deprecated functions in the test (see discussion in https://github.com/atcollab/at/pull/601)

It's different: we may decide that deprecated functions will not be upgraded any more, but still make sure that they work correctly. New code and access to new functionalities need a switch to the new interface, but existing code must still run ! The tests are the only way to guarantee this !

swhite2401 commented 1 year ago

The tests are the only way to guarantee this !

Ok, I agree, but then we should provide only minimal testing for old functions: just make sure that results are identical to the new ones.

Otherwise we have to duplicate all tests for all/new functions... this does not make much sense to me since one is calling the other...

swhite2401 commented 1 year ago

I have parametrized all tests to include ALL functions, the overhead is not so large so this fine

lfarv commented 1 year ago

With correct type hints in internal_lpass and internal_plpass, one should see immediately if there is a problem with the many modified files using them !

lfarv commented 1 year ago

Coming now to the contents of the new interface: I see 2 nice improvements:

lattice tracking with single or multiple processes are unified, which makes the choice much easier. By the way, I saw that from python 3.13, the "spawn" startup mode will be the default for all platforms, including linux. Is the "fork" method going to disappear?
the output format allows for future additional fields when necessary (the exact contents can be discussed).

On the other hand, I am sceptical about the merge of Element and Lattice tracking, for the following reasons:

the code is mostly disjoint: almost no common part,
the argument list is almost completely different:
- nturns, refpts, keep_lattice, keep_counter, turn, losses, unfold_beam have no meaning for element tracking,
- the two common arguments, particle and energy behave differently: they are almost useless for lattice tracking since they are part the Lattice object. They could even be removed if we would restrict the lattice tracking to Lattice objects. On the other hand, they are mandatory for element tracking (other than default values)
for object methods, the type is known, but we anyway call a single function which immediately splits in two branches…
I do not see any added value: I don't imagine any piece of code where the tracking object could be one or the other. It's always a known type.

For me the best situation would be:

a lattice_track(lattice,...) function with a Lattice.track(...) method, and a element_track(element) function with a Element.track(...) method

Any other advice on that?

swhite2401 commented 1 year ago

I have a personal preference for a single tracking function but will not oppose to keep Element and Lattice tracking separate, it is also quite simple and will indeed not affect users (I don't think element_pass is used by many...). Any advice other than @lfarv?

lfarv commented 1 year ago

Another question about the duplication of the input buffer. I take the case of tracking with many particles for many turns (collective effects for instance). To avoid generating a huge output buffer, a common solution is to track turn-by-turn, or by slices of a few turns:

r_in = np.array(...).  # Large buffer
for turn in range(10000):
    r_out = lattice_pass(ring, r_in, nturns=1, turn=turn,...)
    # process r_out

This way, r_out is only single turn data, and there is no buffer copy. The overhead for entering lattice_pass is small enough.

With the new interface, this will duplicate the input buffer on each turn, without any benefit, and may degrade performance.

Did you think of a way to inhibit to duplication of r_in ?

Or, if the idea is to keep track of the input data, wouldn't it be more meaningful to make a copy of r_in, as you do, but track the copy and leave r_in unchanged ? This is the way most functions work… So it could be the default behaviour, no-one would be surprised, but an option could allow tracking r_in itself for cases as the one I showed above.

swhite2401 commented 1 year ago

Very good point, myself I use:

r_in = np.array(...).  # Large buffer
for turn in range(10000):
    lattice_pass(ring, r_in, nturns=1, turn=turn,...)
    # process r_in

but it is true that the in-place modification regularly comes as a surprise for many users... we can certainly add a 'silent' option or something similar

lfarv commented 1 year ago

What about my other alternative:

we put a new keyword in_place: bool = False

if it is False (the default), you make a copy of r_in and track the copy. This is what everybody expects: the function does not modify its arguments. You need to set at least refpts=End (which could be the default also, as it is now) to get the result in r_out.
if it is True (set explicitly to perform the turn-by-turn tracking described above), you track r_in directly, as it is done in lattice_pass.

This is for me the most understandable solution.

swhite2401 commented 1 year ago

@lfarv, I have implemented your suggestions:

element and lattice are treated in separate functions
in_place flag added
some output parameters remove (I kept rout in case at.End is not included in refpts it can be useful to restart the tracking)
in element_track I only output rout because the other were empty so the interface is different in the end... I was hesitating on this one but as you said element_track is almost only used internally or for development purposes

Any other suggestions?

lfarv commented 1 year ago

Apart from docstrings mentioned above, that looks nice!

swhite2401 commented 1 year ago

All docstrings corrected as suggested

lfarv commented 1 year ago

I think that the default value for in_place should be False. This is what is usually expected from a function (no modification of its input arguments) and answers the concern in #568. It is the opposite of the behaviour of lattice_pass, sure, but since these are new functions, so no problem with that. And this was what I initially proposed. That leaves in_place=True available for particular cases where users should know what they are doing.

oscarxblanco commented 1 year ago

Dear @lfarv ,

I'll finish reviewing #624 and get back to this topic.

swhite2401 commented 1 year ago

Also @lmalina told me he is having a look

swhite2401 commented 1 year ago

No rush by the way!

oscarxblanco commented 1 year ago

@swhite2401 In pyat/at/physics/frequency_maps.py I use a parameter called ncpu to directly set the maximum number of cpus used in patpass. https://github.com/atcollab/at/blob/e2ab70cc55917267250e566a66cae825e5b5e42f/pyat/at/physics/frequency_maps.py#L26

This line seems incompatible with the new tracking method. I get the following error

  File "/home/sources/physmach/blanco-garcia/codes/pyenv/venv_pyat_from_sources/lib/python3.10/site-packages/at/physics/frequency_maps.py", line 178, in fmap_parallel_track
    patpass(ring, z0.T, nturns, pool_size=ncpu, losses=lossmap)
  File "/home/sources/physmach/blanco-garcia/codes/pyenv/venv_pyat_from_sources/lib/python3.10/site-packages/at/tracking/deprecated.py", line 281, in patpass
    return internal_plpass(lattice, r_in, nturns=nturns,
  File "/home/sources/physmach/blanco-garcia/codes/pyenv/venv_pyat_from_sources/lib/python3.10/site-packages/at/tracking/utils.py", line 86, in wrapper
    return func(lattice, r_in, *args, **kwargs)
  File "/home/sources/physmach/blanco-garcia/codes/pyenv/venv_pyat_from_sources/lib/python3.10/site-packages/at/tracking/track.py", line 84, in _plattice_pass
    return _pass(lattice, r_in, pool_size, start_method, nturns=nturns,
  File "/home/sources/physmach/blanco-garcia/codes/pyenv/venv_pyat_from_sources/lib/python3.10/site-packages/at/tracking/track.py", line 52, in _pass
    results = pool.starmap(passfunc, args)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
TypeError: 'use_mp' is an invalid keyword argument for this function

I wonder if it would be better to remove this parameter, and let python automatically use all parallel resources, or, to change the call for something compatible with the new tracking method.

lfarv commented 1 year ago

The automatic resource allocation occurs when you do not specify pool_size. The bug you got is different and will be easily corrected.

lfarv commented 1 year ago

@swhite2401: The bug reported above by @oscarxblanco should have been caught by the tests. But the parametrisation in test_patpass.py does not work !

swhite2401 commented 1 year ago

I think I have fixed everything, is this ok now?

lfarv commented 1 year ago

I think I have fixed everything, is this ok now?

I think so !

oscarxblanco commented 1 year ago

@swhite2401 I have run the same test to calculate a diffussion map and it run without errors.

swhite2401 commented 1 year ago

I am ready to merge, should I wait for another approval?

lfarv commented 1 year ago

Go on, everything is OK !

lmalina commented 1 year ago

Following #568 , adding a test that track_function does not modify its input parameters would be nice.

swhite2401 commented 1 year ago

I have added a test of rin for in_place=False and corrected the docstring as suggested

lfarv commented 1 year ago

@swhite2401: it seems that in find_elem_m66, the particle and energy keywords must be forwarded to internal_epass:

internal_epass(elem, in_mat, **kwargs)

swhite2401 commented 1 year ago

it seems that in find_elem_m66, the particle and energy keywords must be forwarded to internal_epass:

Done

swhite2401 commented 1 year ago

And last one I hope.

I hope as well! I leave up for comments until the end of the week in case

swhite2401 commented 1 year ago

@lfarv can you re-approve? I had to resolve conflicts with the master...

atcollab / at

New interface to tracking functions #608