Refactor numpy AA, ortho PGD, speed up 5x to 10x

wangzcl commented 2 months ago

I did a lot of edits in my fork for my research project. I added orthogonal PGD, did refactorization to align the API and behavior with sklearn official modules like NMF, and speed up pgd methods 5x ~ 10x. The fork woks well on my device. It is a too big commit and there are still some changes may be too radical, but I think these changes are worth discussing at least and I am happy to do so.

User API changes: 1, 2, 3, 4, 15, 19
Developer level changes: 2, 5, 6, 9, 18
Behavior changes: 3, 8, 10, 11, 12
Sklearn Compatibility: 2, 3, 4
Performance Improvement: 8, 13, 14, 15
Orthogonal projection onto unit simplex: 15
Cython: 15, 16
Docstrings: 7, 8
Pytest: 17

All the parameters of AA has to be provided as keyword arguments except n_archetypes, following the practice of sklearn.
Add AA attributes: + n_archetypes_ (actual number of archetypes after fitting) + n_iter_ (actual number of iteration performed) + rss_, or reconstruction_error_ available after fitting.
All the parameter type checks are moved to AA 's class attribute _parameter_constraints. Parameters are now saved to attributes as is, and is automatically checked at data fitting (with sklearn's @_fit_context) instead of instance initialization. For reproducibility, random_state is saved as is, and a new random generator is created at fitting.
Functions are now not allowed as the init kwarg. Only strings are accepted, as passing a function to kwargs is uncommon for sklearn estimators.
Deleted: AA._compute_archetypes and AA._loss. These computations are implemented more efficiently by using in-place matrix calculation in fit_transform and other functions.
Change nnls to match what is in AA: default kwarg name is max_iter_optimizer and default const is 100.
Match docstring of AA with the actual code. E.g. sparse matrix support is deleted;
Match actual code with docstring: A copy will be created if data array is not C-contiguous (also for performance).
Like what is in sklearn.decomposition.NMF, though we know all different optimizing methods have to do alternating optimization, we do not constrain A_optimize and B_optimize in class methods. Instead, we do alternating optimization manually in nnls_fit_transform and pgd_fit_transform. To implement a new optimizing method xx:
- Implement xx_fit_transform(X, A, B, archetypes, *, max_iter, tol, verbose, **kwargs), where A, B and archetypes are initialized $A_0$, $B0$ and $B{0}X$. This function should return optimized A, B, archetypes, number of iterations i, a list of loss, a bool flag of convergence.
- Implement xx_transform(X, archetypes, *, max_iter, tol, **kwargs) that returns a matrix A.
- Register the method into _parameter_constraint and if-else branches in AA.transform and AA.fit_transform: if self. method=="xx": do... The benefit for doing so:
- Avoiding complicated inheritance in AABase_3 and AA_3 classes
- Separating optimization from initialization
- Flexibility and customization in different optimizing methods
- makes it easier to put alternating minimization in tight loop to improve performance (e.g. easier to do in-place calculations)
- Introduce less unnecessary attributes to AA.
Verbose mode start printing RSS before the first iteration step is done.
For PGD-like iterations, if there is no improvement w.r.t loss, no descend step will be performed.
For PGD-like transform, multiple iterations are performed so that A should sufficiently converge to local minimum.
Memory allocation improvement: All the matrix calculations (product, plus&minus) are performed in place ,which saves 5%~30% time in these calculations on my device. Use np.vdot(a, b) instead of np.sum(a * b).
In PGD-like optimization, values of objective function are evaluated directly $||X-ABX||^2$ , not what is in M&H paper, as I found that when n_samples is large the latter becomes slower than the former (both are sufficiently optimized).
Implemented fast projection onto unit simplex (unit_simplex_proj,Condat 2016 Algorithm) in Cython (and also normalized one in M&H paper, l1_normalize_proj). These two functions project the input array in place and return None. I renamed M&H method as "pseudo_pgd" as their normalization is not an "authentic" projection ...
Modify pyproject.toml to make it compatible with Cython. Run hatch build to compile .pyx to .c and .so and build the wheel.
Add pytest for projection functions.
Remove AABase. Base classes may be needed later if we implement variants of AA, but before that we are not really sure what to be put into the base class and what to the subclass.
multiple inits for numpy.AA (n_inits kwarg like sklearn.Kmeans, return the best one)

aleixalcacer commented 2 months ago

Congratulations, I see you've done a lot of work :)

I've been on vacation and haven't been able to look at anything. I'll review it thoroughly tomorrow. Is there anything specific you want me to check? Is it ready for a merge? Or do you want to make more changes (perhaps they could be done in another PR)?

wangzcl commented 2 months ago

Congratulations, I see you've done a lot of work :)

I've been on vacation and haven't been able to look at anything. I'll review it thoroughly tomorrow. Is there anything specific you want me to check? Is it ready for a merge? Or do you want to make more changes (perhaps they could be done in another PR)?

All the changes I've made are described in the comment above (sorry it is too long ...). If you have any thoughts or concerns, please let me know. It does pass all the tests and is able to infer correct archetypes in my preliminary test, so I think it is ready for a merge (but we don't need to hurry though). I do have a few more new commits, which we can discuss later.

Thanks!

aleixalcacer commented 2 months ago

Congratulations, I see you've done a lot of work :) I've been on vacation and haven't been able to look at anything. I'll review it thoroughly tomorrow. Is there anything specific you want me to check? Is it ready for a merge? Or do you want to make more changes (perhaps they could be done in another PR)?

All the changes I've made are described in the comment above (sorry it is too long ...). If you have any thoughts or concerns, please let me know. It does pass all the tests and is able to infer correct archetypes in my preliminary test, so I think it is ready for a merge (but we don't need to hurry though). I do have a few more new commits, which we can discuss later.

Thanks!

Certainly, everything looks perfect to me. Congratulations on all the work you've done. The only question I have is about Cython, just in case it causes issues during compilation on some platforms (though it shouldn't). But if it generates the wheels, there’s no problem.

Things you should complete before merging:

Update https://github.com/aleixalcacer/archetypes/tree/main/.github/workflows to compile Cython and create the project wheels, or is it not necessary?

In any case, you're doing a great job. I'm currently finishing my thesis and don't have much time, but if you need help with anything, let me know.

wangzcl commented 2 months ago

I'll check the github workflow to see if there's anything I can do. Thank you for everthing and good luck with your thesis!

wangzcl commented 1 month ago

I think this should to the trick. Sorry for keeping you waiting

aleixalcacer commented 3 weeks ago

I think this should to the trick. Sorry for keeping you waiting

Don't worry, I've been offline this week because on Friday, September 27, I defended my thesis and I took a few days off. I'm going to merge :)

aleixalcacer / archetypes

Refactor numpy AA, ortho PGD, speed up 5x to 10x #40