EMS-TU-Ilmenau / fastmat

A library to build up lazily evaluated expressions of linear transforms for efficient scientific computing.
https://fastmat.readthedocs.io
Apache License 2.0
24 stars 8 forks source link

Rework Algorithm structure #38

Closed ChristophWWagner closed 5 years ago

ChristophWWagner commented 5 years ago

Currently, all matrices in fastmat are objects which allows great benefits in handling them, i.e. data storage, nesting instances and caching intermediate computation results. However, the algorithms are still implemented as a function and to every algorithm there exists a dulli-class to manifest the algorithms to the test system. Aside from this ugly inconsistency it is highly desireable to refactor all algorithms to a proper class structure as this would then enable addressing some long-wanted items on our wish list:

Some thoughts on the implementation:

The Algorithm baseclass Any algorithm will be inherited from a baseclass which implements a general interface for common tasks (processing data, inspection, logging, callback handling, ...). This is quite similar to the Matrix baseclass concept, however it makes sense not to force immutable objects here. An algorithm is defined by its implementation, which will be represented by an individual class for every algorithm that specifies the implementation while following the baseclass' paradigms on interfaces and general behaviour. In order to actually use an algorithm a set of parameters and resources are required. These must be given upon instantiation of the base class object. Then it is possible to process chunks of data by calling the .process(data)method of the actual algorithm. It is highly desireable that the algorithm instance is mutable to allow flexible processing with changing parameters

Parameters and variables A parameter controls the behaviour of an algorithm, a variable describes its (internal) state. During the execution many variables are needed, which are in the current implementation not accessible at all. That's bad! One way to get around that limitation is to enforce all variables to be attributes of the algorithms instance. It is promising to also implement generic __setattr__() / __getattr__() handlers to actually store anything the algorithm instance gets assigned to, e.g. by an internal variable dictionary. Handling it this way allows achieving full performance by defining critical parameters or variables as typed members in a cython class header definition while keeping full flexibility with the (slower) dictionary-based variable space.

States A state can be defined as the complete set of variables and parameters in an Algorithm instance. In this sense logging could be achieved by just copying the algorithm instance object at an arbitrary point in time (snapshotting). This could be encapsulated by a .snapshot()routine in the base class. An additional .trace flag could be used in conjunction with an (baseclass-) internal mechanism that automatically adds a state snapshow to a list, effectively generating a trace log. However, logging must actively be invoked via a callback from the algorithm implementation.

Callbacks A callback is a class method that can be overloaded by the user to get active signaling from the algorithm implementation. In order to keep the interface lean and allow for static typing a callback has precisely one argument, being a Algorithm (baseclass) instance. During a call the algorithm implementation, which is usually the callback origin, the algorithm instance will be passed to enable the callee to extract or modify information as required.

Please discuss this proposal and add your ideas.

SebastianSemper commented 5 years ago

As a cherry on top, we should get rid of CG and maybe construct a fastmat interface that uses the new algorithm structure, to expose the scipy algorithms to fastmat users. This could now be achieved by providing some fastmat wrapper functions (i.e. fm.Algorithm classes) around these thus yielding a consistent interface for fastmat users.

ChristophWWagner commented 5 years ago

Please also check out the new alg-refactor branch, which contains most of the proposal already (but still WIP -- the tests still fail)

Does wrapping the scipy algs bring an advantage to users? As we cannot "insight" these there is no gain connected in terms of the new features -- i.e. interval variable references, tracing -- except of encapsulation

ChristophWWagner commented 5 years ago

signed, sealed, delivered, I'm yours.