MOGPR performance analysis & improvements

Open-EO / FuseTS

Time series Fusion toolbox integrated with openEO

https://open-eo.github.io/FuseTS/

Other

22 stars 3 forks source link

MOGPR performance analysis & improvements #75

Open jdries opened 1 year ago

jdries commented 1 year ago

The current MOGPR implementation can be quite heavy, especially when computing/training a new model per timeseries, but also the basic inference.

The main work for this is being done in the GPy library, which also has a successor based on Tensorflow (GPFlow). Hence the most obvious thing to try first, is porting to this tensorflow based library: https://gpflow.github.io/GPflow/develop/notebooks/advanced/multioutput.html

Current GPy model training: https://github.com/Open-EO/FuseTS/blob/main/src/fusets/mogpr.py#L364 And inference: https://github.com/Open-EO/FuseTS/blob/main/src/fusets/mogpr.py#L389

Note that there's also the idea of avoiding to retrain the model per pixel, which is a separate issue!

mlubej commented 1 year ago

Thanks, I will share this with our experts to see if we can push it to our agenda and take a look. Will keep you posted

mlubej commented 1 year ago

We looked at the code for MOGPR. We prepared a snippet for running and debugging the core functionality.

Our first impression was that the code could be written in a bit more pythonic way, however, after running the profiler we noticed that the majority of time is spent at training, so we're confirming findings from others. Making the code more pythonic wouldn't make a dent.

The code was run on the test dataset used in the notebook. Here are the results:

Total time spent in the optimization (training) part is ~1000 s, which is 90% of total time and ~0.16 s per pixel in average (quite slow). In order to run things as they are, one would need to optimize to the order of the 1 ms per pixel level to make things fast enough to be useful and convenient.

We had a quick look at the GPFlow library, suggested by @jdries. It does look like a drop in replacement for the GPy library, so @veseln and I will try to make this change to see if there is a speed improvement.

While we're on the topic. I think there is a bigger issue here. This method requires training the MOGPR model for each pixel. This is the main problem of the slowness of this approach. It seems to me that such an approach is also inherently suboptimal, or even wrong. One of the main issues (putting slowness aside) is probably bad performance when the pixel data is bad. Not having NDVI due to all obs. being cloudy means that the training for this particular pixel will be sh*t/fail, and the output will be useless, while there exist plenty of other pixels with similar phenology and plenty of data to be used for training and creating a useful model.

I don't believe the GPy authors did a bad job writing their code, additionally the library seems to be quite popular, so I don't think the core functionality is slow and sub-optimal. This is why I'm not so optimistic about the GPFlow library making much difference. It all points more in the direction of us using it wrong. I believe it would be better to have an inherently different approach to solving this problem, e.g. having a collection of seed timeseries on which to pre-train the model and then use these models on the timeseries at hand.

Example use case: One could have different seed timeseries which belong to different crops/meadow/..., and find the best fit for the timeseries at hand to find out which model to apply. This way we would have pre-trained models on a much smaller (but much more defined) timeseries subset. During inference we would just need to find a way to select the most appropriate model and use it.

This requires quite some change in the codebase, so I'm not sure how to proceed (@JanssenBrm @jdries @JochemV)

We will still try to make use of the GPFlow replacement. I'll report here with more concrete results.

msalinero commented 1 year ago

Hi @mlubej ! Thanks for the thorough study.

Mogpr models do not rely on samples, but on hyperparameters regarding the input signals and the relation between them. Most of the mogpr processing time is spent on the tuning/optimization of these hyperparameters(lengthscale, variance , B.W and B.kappa). Let´s say that for the most demanding setting that we have tried, with 3 input signals(NDVI, RVI_ASC, RVI_DESC) and 4 years, we have the following,

It takes a running time of ~39 seconds(this processing is done locally on my personal computer) If we fix this hypeparameters to the values obtained before and run again, we get this processing down to ~4.3s. This functionality can be tested in mogpr_1D adding the trained gpy model as input. The constraint fixing is done in https://github.com/Open-EO/FuseTS/blob/main/src/fusets/mogpr.py#L367 The good thing about these hyperparameters is that they are used to conform the shape and trend of the entire signal based on the input. This brings me to the point where I suppose we were thinking in the same direction. If we have representative hyperparameters based on sample timeseries from different crops, land covers, etc. we can circumvent the expensive tuning step. In that direction, I´ve been working in the retrieval and study of these hyperparameters from several crop parcels to implement this study. The idea is similar as implemented in this paper. https://www.mdpi.com/2073-4395/10/5/618 In the past, we even had promising results with global hyperparameters that joined several crops. Also, the models would be very light(2 doubles and 2 little matrices per signal).

When I have some results I´ll post them here.

mlubej commented 1 year ago

Most of the mogpr processing time is spent on the tuning/optimization of these hyperparameters. It takes a running time of ~39 seconds (this processing is done locally on my personal computer) If we fix this hypeparameters to the values obtained before and run again, we get this processing down to ~4.3s.

@msalinero the results I posted above were done for a single variable (is this then only interpolation?) and for a single year. Are the numbers in your case derived for a single pixel timeseries? If yes, then I imagine even 4.3 s per pixels is still quite slow, but I agree that what you suggest with fixing hyperparameters for some specific phenology is a good approach.

We also discussed about the possibility to randomly sample timeseries from the AOI and run unsupervised clustering in order to obtain some generic and representative timeseries for the most common areas. Of course, in case of crops it might make even more sense to do supervised clustering. As you say, even doing this just for fixing the hyperparameters might already prove to be beneficial. Would it be possible/applicable to train the full model for such representative timeseries, or would such an approach make sense only for the hyperparameter tuning?

msalinero commented 1 year ago

My results are for one pixel or, better said, for the mean of the rye patch showed in https://github.com/Open-EO/FuseTS/blob/main/notebooks/AI4FOOD_OpenEO_MOGPR_S1S2.ipynb , which is operationally equal. Indeed if you only use one variable, you are doing GPR interpolation, losing the advantages of a supporting variable like, e.g RVI(Descending) in gap conditions like clouds. Also, reducing the number of variables reduces greatly the processing time. For instance, repeating the experiment using 2 variables (NDVI, RVI_DESC) given the other signal didn´t add more information. The processing time came down from 3s to ~390ms when fixing the hyperparameters

Would it be possible/applicable to train the full model for such representative timeseries, or would such an approach make sense only for the hyperparameter tuning?

From my point of view, this approach would only make sense for the hyperparameter tuning

mlubej commented 1 year ago

@jdries @msalinero together with @veseln we tried the almost drop-in replacement of GPFlow instead of GPy. The two libraries are very similar and also the usage logic was basically the same, although we did follow the coregionalization example instead of the multi-output one mentioned above.

The code difference wasn't big so there is little chance of us f***g up something, but in the end the new code ran 4x slower.** I can't imagine why tensorflow - which seems to just have more wrappers around the core functionality - would run any faster. Perhaps it's able to paralellize faster using GPU, but in our case we are interested in making the per-pixel process as fast as possible, since we're applying this UDF on the pixels in the vectorized manner. At the same time I don't think we should assume users are familiar with their GPU setups in the sense that this should be the default.

Our suggestion would be to stick with the GPy implementation and rather change the approach to how we use it, as started in the discussion above between @msalinero and me.

@jdries a question for you: Imagine the use case where one would like to fuse radar and optical data using MOGPR. In order to successfully tune the MOGPR hyperparameters the user would prepare a few key time-series, e.g. for each of the crops they are interested in. For each of these time-series the tuning would be run, and then the constrained process would be applied on the pixel- or parcel-level timeseries of the same crop.

The question here is how to imagine the implementation of such a process? I don't have a clear vision of how the process above should look like in the scope of openeo/fusets, at least in the sense of it being a single pipeline of user-defined functions. It seems to me there are steps between where the pixel/object scope changes, and I find it hard to imagine how to go about this (possibly due to my lack of in-depth knowledge of the codebase). Please advise.

From Sinergise point-of-view, the easiest way to figure out if there is value in fused timeseries is doing object based approach and quantifying what value it brings to area monitoring, so this is the direction in which we would like to go in, similar to the example described above. @JochemV @msalinero would such an approach also fit your use case, or do you have some other approach planned?

jdries commented 1 year ago

Nice overview! Too bad the newer implementation does not run any faster, it's certainly not the intention to require GPU. Did you compare using the intel-tensorflow, or standard one? Avoiding the per-pixel training indeed seems like the best option, the paper (https://www.mdpi.com/2073-4395/10/5/618) also suggest that we can even go for global models? This would be preferable, as having to know the crop type up front is quite a big limitation. Maybe we can even test if the same model can apply to both forests and crops?

Maybe we can also have a feature that samples input pixels, and computes the error of the per-pixel model versus a selected global model. This would then allow a user to evaluate easily if the global model is applicable to the dataset at hand?

Another idea would be if we could determine a set of cheaper timeseries 'features' that have a high correlation with the hyper parameters, allowing us to 'guess' the right model based on cheaper inputs rather than an expensive per-pixel training?

@msalinero are there sets of hyperparameters that have been determined before that we can integrate in the library? And do we know what basic information is needed to a given global model? How input specific are these global parameters?

msalinero commented 1 year ago

@msalinero are there sets of hyperparameters that have been determined before that we can integrate in the library? And do we know what basic information is needed to a given global model? How input specific are these global parameters?

@jdries Happy that you are asking these questions, because my efforts lastly were going in that direction. Following the logic of the paper from our colleague Santiago, I´m trying to get the mean of the MOGPR hyperparameters of several parcels. Then, I´ll try to use these obtained values and see how they behave.

From my knowledge, there are not already generated hyperparameters for each crop type. The ones used in the paper are for LAI and single input GPR. Also, from what we have seen in previous studies, they depend greatly in the variable at hand. Better said, you will not get good results using LAI hyperparameters in a NDVI timeseries.

If we can go for a global crop model, as suggested in the paper, and get promising results, this would greatly simplify the task. Later, of course, we can even play a bit more and see how this model behaves with other land covers, like forest, etc.

mlubej commented 1 year ago

In that direction, I´ve been working in the retrieval and study of these hyperparameters from several crop parcels to implement this study. The idea is similar as implemented in this paper. https://www.mdpi.com/2073-4395/10/5/618 In the past, we even had promising results with global hyperparameters that joined several crops. Also, the models would be very light(2 doubles and 2 little matrices per signal).

Thank you @msalinero for this article. We will follow this procedure where we use averaged HPO for meadow areas. One of the differences is that in our use case we have 1 year of data, so we will have to focus more on time-series where observations are without larger gaps for the training step, but this shouldn't be such a problem.

Do you suggest to go with the Matern 3/2 kernel, similarly as it's implemented in https://github.com/Open-EO/FuseTS/blob/main/src/fusets/mogpr.py#L360 ?

msalinero commented 1 year ago

@mlubej Glad that the paper is of help! I prefer the Matern 3/2 kernel over the standard RBF kernel as the Matern kernel is less smooth thus more sensible to changes. For me it worked better most of the time.

About only having 1 year of data, maybe you can "concatenate" this same year several times and then induce gaps artificially.

JanssenBrm commented 11 months ago

As discussed during the enhancement review: we could already improve the usage experience by clearly documenting the different limitations of the MOGPR implementation within FuseTS.