X-DataInitiative / tick

Module for statistical learning, with a particular emphasis on time-dependent modelling
https://x-datainitiative.github.io/tick/
BSD 3-Clause "New" or "Revised" License
490 stars 107 forks source link

Grand flattening and reorganization of `tick` #124

Closed stephanegaiffas closed 6 years ago

stephanegaiffas commented 7 years ago

Grand reorganization of tick

A lot of features have been added and now tick is much more than a library that fits point processes. The main issue is that the tick.inference module contains a lot of methods that have nothing to do together. I suggest that we reorganize the things in big families of methods, putting together inference, simulation and models together in such families.

This is a big change, that will pose problems with C++ includes. Several base class must be included in a tick.base module as soon as they are used in several modules. An example is ModelFirstOrder python class, or the Model C++ class, that are parent of almost all models in tick. Such classes will have to be in a tick.base.

> TODO: list of all the things that go in `tick.base` and `tick.*.base` modules.

Here is a first attempt of grand reorganization of tick. I've also renamed a lot of classes, so that inference and simulation classes names match...

> TODO: Discuss and give your opinion about this here

1. tick.linear_model

Inference, simulation and models all belong to a common namespace

2. tick.survival

We will put extra things here : Cox regression with the full likelihood, the SCCS model from Maryan, etc.

3. tick.point_process

4. tick.robust

Linear methods with a focus towards robust estimation and outliers detection

5. tick.prox

This module contains all the proximal operators available in tick.

6. tick.solver

This module contains all the solvers available in tick.

7. tick.plot

This remains unchanged

But we can move some of them elsewhere ? Since some plots are dedicated only to specific objects (such as Hawkes things...)

8. tick.preprocessing

This remains unchanged

9. tick.metrics

This remains unchanged

We'll add specific metrics for survival analysis here

10. tick.simulation

About this one : I don't know where to put these, now that all model simulation is moved in other modules. This could be base_simulation or tools, what do you think ?

11. tick.datasets

Remains unchanged

12. tick.model_base

Proposal for the reorganization of the lib directory

lib
├── CMakeLists.txt
├── Doxyfile
├── cpp
│   ├── array
│   │   ├── CMakeLists.txt
│   │   └── alloc.cpp
│   ├── array_test
│   │   ├── CMakeLists.txt
│   │   ├── array_test.cpp
│   │   ├── performance_test.cpp
│   │   ├── sbasearray_container.cpp
│   │   ├── typemap_test.cpp
│   │   └── varraycontainer.cpp
│   ├── base
│   │   ├── CMakeLists.txt
│   │   ├── exceptions_test.cpp
│   │   ├── interruption.cpp
│   │   ├── math
│   │   │   ├── normal_distribution.cpp
│   │   │   └── t2exp.cpp
│   │   ├── model
│   │   │   ├── CMakeLists.txt
│   │   │   ├── model_generalized_linear.cpp
│   │   │   ├── model_labels_features.cpp
│   │   │   └── model_lipschitz.cpp
│   │   └── time_func.cpp
│   ├── linear_model
│   │   ├── CMakeLists.txt
│   │   ├── linreg.cpp
│   │   ├── logreg.cpp
│   │   ├── model_hinge.cpp
│   │   ├── model_quadratic_hinge.cpp
│   │   ├── model_smoothed_hinge.cpp
│   │   └── poisreg.cpp
│   ├── hawkes
│   │   ├── CMakeLists.txt
│   │   ├── model
│   │   │   ├── base
│   │   │   │   ├── hawkes_fixed_kern_loglik.cpp
│   │   │   │   ├── hawkes_list.cpp
│   │   │   │   ├── hawkes_model.cpp
│   │   │   │   └── hawkes_single.cpp
│   │   │   └── variants
│   │   │       ├── hawkes_fixed_expkern_leastsq_list.cpp
│   │   │       ├── hawkes_fixed_expkern_loglik_list.cpp
│   │   │       ├── hawkes_fixed_kern_loglik_list.cpp
│   │   │       ├── hawkes_fixed_sumexpkern_leastsq_list.cpp
│   │   │       ├── hawkes_fixed_sumexpkern_loglik_list.cpp
│   │   │       └── hawkes_leastsq_list.cpp
│   │   │   ├── hawkes_utils.cpp
│   │   │   ├── hawkes_fixed_expkern_leastsq.cpp
│   │   │   ├── hawkes_fixed_expkern_loglik.cpp
│   │   │   ├── hawkes_fixed_sumexpkern_leastsq.cpp
│   │   │   ├── hawkes_fixed_sumexpkern_loglik.cpp
│   │   ├── inference
│   │   │   ├── hawkes_adm4.cpp
│   │   │   ├── hawkes_basis_kernels.cpp
│   │   │   ├── hawkes_conditional_law.cpp
│   │   │   ├── hawkes_em.cpp
│   │   │   ├── hawkes_sumgaussians.cpp
│   │   ├── simulation
│   │   │   ├── hawkes_kernels
│   │   │   │   ├── hawkes_kernel.cpp
│   │   │   │   ├── hawkes_kernel_exp.cpp
│   │   │   │   ├── hawkes_kernel_power_law.cpp
│   │   │   │   ├── hawkes_kernel_sum_exp.cpp
│   │   │   │   └── hawkes_kernel_time_func.cpp
│   │   │   ├── hawkes_baselines
│   │   │   │   ├── constant_baseline.cpp
│   │   │   │   └── timefunction_baseline.cpp
│   │   │   ├── hawkes.cpp
│   │   │   ├── inhomogeneous_poisson.cpp
│   │   │   ├── poisson.cpp
│   │   │   ├── pp.cpp
│   ├── preprocessing
│   │   ├── CMakeLists.txt
│   │   ├── longitudinal_features_lagger.cpp
│   │   └── sparse_longitudinal_features_product.cpp
│   ├── prox
│   │   ├── CMakeLists.txt
│   │   ├── prox.cpp
│   │   ├── prox_binarsity.cpp
│   │   ├── prox_elasticnet.cpp
│   │   ├── prox_equality.cpp
│   │   ├── prox_group_l1.cpp
│   │   ├── prox_l1.cpp
│   │   ├── prox_l1w.cpp
│   │   ├── prox_l2.cpp
│   │   ├── prox_l2sq.cpp
│   │   ├── prox_multi.cpp
│   │   ├── prox_positive.cpp
│   │   ├── prox_separable.cpp
│   │   ├── prox_slope.cpp
│   │   ├── prox_sorted_l1.cpp
│   │   ├── prox_tv.cpp
│   │   ├── prox_with_groups.cpp
│   │   └── prox_zero.cpp
│   ├── random
│   │   ├── CMakeLists.txt
│   │   ├── rand.cpp
│   │   └── test_rand.cpp
│   ├── robust
│   │   ├── model_absolute_regression.cpp
│   │   ├── model_epsilon_insensitive.cpp
│   │   ├── model_generalized_linear_with_intercepts.cpp
│   │   ├── model_huber.cpp
│   │   ├── model_linreg_with_intercepts.cpp
│   │   └── model_modified_huber.cpp
│   ├── solver
│   │   ├── CMakeLists.txt
│   │   ├── adagrad.cpp
│   │   ├── saga.cpp
│   │   ├── sdca.cpp
│   │   ├── sgd.cpp
│   │   ├── sto_solver.cpp
│   │   └── svrg.cpp
│   └── survival
│       ├── coxreg_partial_lik.cpp
│       └── sccs.cpp

cpp-test, include and swig follow cpp structure

stephanegaiffas commented 7 years ago

@Mbompr @MaryanMorel @vinther @Dekken Don't hesitate to comment about this here... it's kind of a big deal :) We can also put in the loop everyone from the team (Youcef, Marcello, etc...)

MaryanMorel commented 7 years ago

Isn't that weird to mix models and learners? It might be quite confusing for the end users.

Mbompr commented 7 years ago

We also need to have a module with all model abstract classes

12. tick.base_model

That would be linked to other model modules in order to avoir circular dependencies.

stephanegaiffas commented 7 years ago

Yes I agree, but what I want to do with this is to show more of what we can do with tick : you can use if you want many more loss for supervised learning than what is done in the current inference things. We can maybe put the models in a tick.*.model submodule, or something like that... And the fact that there are inference classes and models is very clean from the naming of the class, and in the documentation...

stephanegaiffas commented 7 years ago

@Mbompr : yes, you can edit the Issue text including all the base classes :) Same goes for learners...

MaryanMorel commented 7 years ago

So maybe have a tick.base.{model, learner, etc.}. We don't need to have a flat base as end users are not likely to fool around with base classes.

The difference between a model and a learner is clear to us, but I don't know if it'd be for John Does. Besides, what we call Learner is called model in scikit-learn... Maybe we should rename our models "loss" or something else

stephanegaiffas commented 7 years ago

I agree with this. We could use submodules in tick.base. Although I agree with the use of 'loss' terminology for "linear models", what about Hawkes and other models where there is no "loss" but only a global goodness of fit ?

PhilipDeegan commented 7 years ago

I've done some re-organising which can be viewed here: https://github.com/Dekken/tick/tree/re-org

I've separated the includes and sources, and changed all "#include " paths to be complete relative to the directory ./include

Now there is only a single include directory for all cpp files in all modules (excluding swig cpp creation)

This should make things easier to move around.

Edit: meant to hit "update comment" not "close issue" >_<

Mbompr commented 6 years ago

Concerning point_process, I think it is a bit messy to mix inference / simulation / model objects in the c++ parts. Those are very differents objects that don't interract with each other.

stephanegaiffas commented 6 years ago

Yes alright we can reorganize into subfolders, but I'm pretty sure that the problem comes also from naming of files. All model classes should start with model* , simulation classes with simulation*

PhilipDeegan commented 6 years ago

Are there are outstanding items we should consider before closing this?

stephanegaiffas commented 6 years ago

No it's alright, I'm closing it