Longevity and the Analysis Pipeline

gwm17 commented 4 months ago

This is a longer discussion and not strictly an "issue" but it will be here anyways so that people can monitor the process

Spyral has reached a "plateau" in some ways. The original goal of being able to analyze AT-TPC data and transform the raw data into physical observables has been "completed". But, as we all know, the AT-TPC is very modular and complex and is being used in a lot of creative and novel ways, which will necessitate changes to Spyral.

As of right now it isn't clear how we intend to support that.

The original idea was to have users fork the main repository and then tweak and change the analysis to suit their needs. In general this works; see the e20009 and the 15C forks that are currently in use. However, this experience has shown that this method can be really brittle. Spyral is too young, and still receives too many updates that sometimes require deep changes to the internal systems, which can essentially break forks. This discussion basically puts into question how long Spyral the main repository is supposed to be supported/maintained/improved for; near term or long term?

After thinking about the problem for a bit, one solution that kept coming back up was a modular analysis pipeline. This was something considered from the beginning of Spyral's development, but rejected in favor of more rapid development and better ease of comparison to existing code bases. The pipeline concept is as follows: a Phase is now an abstract concept that each implementation inherits from. We then collect the implemented Phases into a Pipeline. The Phases are self-describing; their attributes are the configuration parameters which control the analysis process, and they have attributes/methods for querying workspace related concepts. Phases have a run method which returns a PhaseResult. PhaseResults are a payload pointing to the output of that Phase. The result of the previous Phase is then passed to the next Phase in the Pipeline and so on and so on. Some examples of this kind of infrastructure can be found in packages like scikit-learn, GEANT4, etc.

Spyral becomes a package rather than a framework (hooray for pip install spyral!). Users then import spyral and setup their configuration in their own code! They can use our building blocks and implement their own as needed!

But here's the catch...

There are some problems

Phase: abstract base class or Protocol?
PhaseResult: Bundle a literal object, a path to a resulting object. Inherited implementations, or blanket case?
Configuration: Self describing is nice but what about shared parameters? I.e. detector length, beam region, etc
Assets: This is a big one. Take the interpolation mesh... how would this even work? Each phase gets an asset directory? How do we handle pre-generation of big assets like the mesh?
Workspace: So phases can do their workspace paths... but how do they know the parent path? Also assets again...
Notebooks: They would need their own repo with some basic impl.
Particle ID: Ugh. Needs to become true asset, but its coupled and external...
Parallelization: You would need to be able to deepcopy the pipeline without blowing up the memory usage
Convenience features: progress bars, message passing, jit-ing...

Last but not least, adoption. We already have users, and this would be a HUGE change. Science in my experience can be iffy on this sort of thing, even if it is an overall improvement.

As of now this is all in the planning stage. A branch is going to be made where some of this is tested and trialed. Periodically, we'll comment on this/ping it from commits, so that the progress should be shown here. Feel free to make comments/give your two cents on the idea or your experiences with Spyral as it exists now!

BTW, this doesn't mean that Spyral as it currently exists will stop being maintained or improved. It's more that we are exploring options to make Spyral last the test of time, whatever that might look like!

gwm17 commented 4 months ago

First draft of this new method is done in the pipeline branch! It is still a work in progress, but the method outlined there functions and would replace the framework style of Spyral as it currently exists.

gwm17 commented 3 months ago

Nearly there now, just dotting the i's and crossing the t's

gwm17 commented 3 months ago

Woooo! Done as version 0.6.1!

ATTPC / Spyral

Longevity and the Analysis Pipeline #99

There are some problems