Question: Is it possible to improve speed by reusing a pre-trained model?

WillianFuks / tfcausalimpact

Python Causal Impact Implementation Based on Google's R Package. Built using TensorFlow Probability.

Apache License 2.0

593 stars 70 forks source link

Question: Is it possible to improve speed by reusing a pre-trained model? #37

Open mc-karsa-tech opened 2 years ago

mc-karsa-tech commented 2 years ago

Hi, thank you for all your effort in this library.

We have many datasets for which we need to compute causal impact. Currently, it is too slow to compute it on CPU or GPU. However, these datasets have the same first rows (in pre-intervention) and only a few last rows are unique in each dataset.

We would like to do these steps in order to get better speed:

Pre-train model with the common rows and save it in a file.
For each dataset read a pre-trained model, update it to contain all rest rows from a dataset and compute causal impact.

Basically, it is an incremental training of an already trained model. Is it possible to achieve this, please?

WillianFuks commented 2 years ago

Hi @mc-karsa-tech ,

I like your idea. This is technically possible but I its implementation might not be that easy. The main issue is that a pre-trained model is essentially represented not only by the model itself but also by the posterior samples of each parameter there, which so far can be processed by two algorithms (variational inference and hamiltonian monte carlo).

So we'd need to think in a way to accept not only a customized model but also the resulting posterior samples.

I'll keep this open and if I find an easy way to implement it I'll try to allocate some time to add this feature (PRs with full unit tests are also welcome).

Thanks,

Will

mc-karsa-tech commented 2 years ago

Hi Will, thank you for your response. I confess that I don't understand it fully since I don't have sufficient knowledge in statistics but I do understand that it is not easy to do. One thing I know is that we use hmc (hamiltonian monte carlo) instead of vi (variational inference) because hmc seems to be more stable when we repeat evaluation on the same data. Unlike hmc, vi gives us very different results on each run. Have a nice day!

WillianFuks commented 2 years ago

Hi @mc-karsa-tech ,

I'd like to confirm one thing: when you say "incremental training" you mean that the pre-trained model should be trained again with new fewer rows? I thought at first that the same pre-trained model would be solely used and only post-intervention rows would change.

Let me know what interpretation is correct :)

mc-karsa-tech commented 2 years ago

Hi Will,

it would help us, if we would be able to train pre-intervention rows and reuse this trained model for different sets of post-intervention data, so that evaluation of each new set of post-intervention rows would be fast. This way we could have better speed until pre-intervention data are changed.

The second option (reusing only some first part of pre-intervention and appending new rows to it as well as changing post-interention rows) would provide us even better speed up.

Have a nice day!

WillianFuks commented 2 years ago

Hi @mc-karsa-tech ,

I've been thinking on ways to cache the model for your use case but unfortunately so far I couldn't find anything useful. The main issue is how TFP implemented the linear regression, as you can see here. The post-intervention data affects the training of the model by setting the design matrix of the linear regression.

This prevents us from installing any viable caching system.

Also, for the second option, that would not be possible as well. If we change training data a bit then the whole training has to run again, only thing that might help in this case is setting the priors with the post-fitted posteriors of the model but performance-wise that wouldn't change anything.

So the bad news is that for now I don't see any viable solution for this problem, as far as caching goes.

What could be another option though is to run your job in parallel. Have you tried multiprocessing already? I suspect that for now that's the only technique at our disposal that will help in this issue (notice that multi-threading won't work as it's CPU bound, not IO).

Let me know what you think.

Best,

Will

mc-karsa-tech commented 2 years ago

Hi Will,

thank you for your answer. We already run multiple processes with Causal Impact computation, but it only gives several times improved speed up. It seems still not enough for us.

Do you thing that we could train Neural Network (NN) to predict Causal Impact and use this trained NN to predict impact instead of using actual Causal Impact computation?

WillianFuks commented 2 years ago

I see. As for the NN I think it's possible (maybe using LSTMs). Not sure if that would help much performance-wise and also not sure if you can extrapolate confidence intervals with this technique.

Another thing that can be done is to use statsmodels UnobservedComponents which was the first algorithm used on pycausalimpact (the package unfortunately has been deleted by Dafiti so it may be a bit out of date). Still it's dozens of time faster than the TFP implementation (but it doesn't follow a bayesian approach so you can't really manipulate priors).

A cool thing to do would be to update tfcausalimpact to also run on top of statsmodels but that would take lots of time as well.

mc-karsa-tech commented 2 years ago

Thank you for information. An implementation based on statsmodels seems to be a way for us how to get fast Causal Impact. Thank you for the link to the older pycausalimpact library.