Issues comparing with R / Python / Fortran version

jcrotinger commented 8 years ago

Hi @brandtg,

I've been looking for a Java STL package and was psyched to find that you were working on one!

My first comparison with the Java code was for some artificial data from the NAB dataset (e.g. https://github.com/numenta/NAB/blob/master/data/artificialWithAnomaly/art_daily_flatmiddle.csv). When I tried this with the Java implementation, the results didn't look quite right. In particular, there were issues with the trend at the ends, IIRC. I was trying to sort out possible config differences (I should repeat with the "periodic" option enabled, but generally I won't want this enabled) when I ran across the performance discussion and the hourly.txt data set that you've been using for testing. So I grabbed this data and did my own set of comparisons. Once I got the configs right, the Python and R implementations basically agree to close to numerical noise. However, the stl-java version, while qualitatively similar, is still fairly different from the Python/R results, with the trend differing by on the order of 10%. For example: screen shot 2016-04-12 at 6 11 53 pm

I'm definitely interested in using this package, but I want to figure out whether the configs just aren't right yet (given the funky inputs to the LoessInterpolator) or there's an issue either in the StlDecomposition class or the underlying LoessInterpolator. I'm happy to help dig in if you have any suggestions.

Oh, I'll try to repeat my test on the NAB example with "periodic" set to try and see if I still see the odd behavior that I saw earlier. Will let you know.

Cheers - Jim

hntd187 commented 8 years ago

@jcrotinger @brandtg and I have been looking at the funky output for sometime now trying to figure out why it gets weird in certain places. I don't know if he has some more insight, but I'm still slightly confused why it does that at the end, but I have reason to believe it's in the loess smoother. I think honestly the R impl has a different way than the java which is what yields a strange result at the tail end of it, but Greg might have more insight than I do.

jcrotinger commented 8 years ago

The R implementation is just a very thin wrapper around the Fortran, as is the Python implementation. I was confused by this since the original paper talks about the S implementation, which is NOT the same as the Fortran (it uses K-D trees for performance in the Loess smoothing instead of skips, and it has several additional features, like post-STL seasonality smoothing, which I assumed was just part of the algorithm after reading the paper, but they are not part of the Fortran version). Here's the S doc. Let me rerun the NAB artificial anomaly example - it is much more clear cut.

jcrotinger commented 8 years ago

I've re-run my earlier test on this NAB dataset. Here is the result from my Python test: screen shot 2016-04-13 at 10 59 40 am I'm using the default for the length of the trend smoother and it seems perhaps a little short as the trend is picking up a bit of the seasonality, and there is a very small non-periodic blip at the beginning, but mostly its pretty flat until the "anomaly". This run isn't using "robust" smoothing - I have no=0, ni=2 to match how I ran the Java. The Java was run with

java -Dinner.loop=2 -cp /Users/jim.crotinger/.m2/repository/org/apache/commons/commons-math3/3.5/commons-math3-3.5.jar:/Users/jim.crotinger/git/github/stl-java/target/stl-java-0.1.2-SNAPSHOT.jar com.github.brandtg.stl.StlDecomposition 288 <art_daily_flatmiddle.csv >art_daily_flatmiddle_java.csv

Here are the Java results (plotted in the same Python notebook): screen shot 2016-04-13 at 11 17 28 am As you can see, the trend has a excursion that is on the order of the size of the seasonal component at the very beginning. Its excursion at the anomaly is qualitatively similar to the Python version but somewhat smoother, and the Python one looks like it is recovering back to near-flat after the anomaly more quickly than the Java version.

Another issue that I just noticed is that the Java version is not capturing as much of the seasonality as the original version. Here is a comparison: screen shot 2016-04-13 at 11 33 49 am As a result, the residual is retaining seasonal structure that it shouldn't. The structure is very close - in fact if you scale the Java seasonal component up by 8-10% then it is very close to the Python result.

jcrotinger commented 8 years ago

Stepping through the code, I see that the "periodic" option is not implemented in the same fashion as R does it (and that I'm doing it in Python). The R wrapper sets ns = 10 * n + 1 where n is the total number of observations, and isdeg = 0, where isdeg is the degree of the polynomial fit used for the cycle subseries, and then calls the underlying Fortran; i.e. it computes the "moving average" of the cycle sub-series over a window that is larger than the actual data, resulting in a "fit" of the cycle sub-series by its average value. The R code follows up by ensuring exact periodicity via an explicit post-STL averaging of the cycle sub-series and replacing the individual values with this average. (My version in Python doesn't do this and gives the same results, which seems right given that the Loess with degree 0 and a very-large window should already be doing that average.)

It appears that StlDecomposition is computing the seasonality using the default parameters (bandwidth = 0.4, linear fit [the only option]) and then doing the explicit average after-the-fact. While the residual is adjusted so the results are consistent, it isn't clear (or even likely) that the trend will be unaffected. In the "hourly.txt" data, for instance, the seasonality would clearly have strong trends that would affect the trend component. The last-minute averaging of the seasonality comes too late for it to have the same impact on the trend/seasonality iteration that it has when it is imposed as in the R version.

(Replicating the R version is, unfortunately, going to require some work since the LoessInterpolator only supports linear interpolation.)

All this said, for the data above I would expect very little evolution in the seasonal component, particularly if the outer iteration is run. So I don't think that the post-STL averaging is the cause of the inconsistencies above.

jcrotinger commented 8 years ago

@brandtg @hntd187 I've re-run with minimal bells and whistles and with the bandwidths chosen so that the bandwidthInPoints computed in the LoessInterpolator matched the values that are set in the calls to the underlying Fortran. Here is the Python result - I've made nl and nt explicit although these would be the default values: screen shot 2016-04-13 at 6 14 47 pm For the Java run, I had to modify main so that I could set more of the parameters. I think they're obvious. (outer.loop sets numberOfRobustnessIterations; I don't think the number of Loess robustness iterations needs to be exposed - pretty sure we always want to do the weight adjustments at the STL level based on the STL's residual, not at the Loess level based on whatever residual it is seeing in the current fit, but I haven't played with that option - certainly that is something the Fortran doesn't expose.) Here is the Java command line:

      $ java -Dtrend.bandwidth=0.1075 -Dseasonal.bandwidth=0.85 \
             -Douter.loop=1 -Dinner.loop=1 -Dstl.periodic=false \
             -cp $BLAH com.github.brandtg.stl.StlDecomposition 288 \
             art_daily_flatmiddle.csv art_daily_flatmiddle_java_noperiodic_norobust.csv

There are 14 periods in the data, but during the smoothing of the cyclic sub-series two extra periods are added, giving 16 in the smoothing, so I had to set the seasonal bandwidth to 0.85 in order to get bandwidthInPoints up to the 13 that I used in Python. (Note that the default is 7 in the R/Python and I don't think there is any attempt to ensure that the default bandwidth results in a minimum of 7 points, or that the number of points will be odd. I've added some local TODOs to fix these things.)

Here are the results: screen shot 2016-04-13 at 6 26 21 pm Note that the seasonality has much more variation than that found by the Python version. As a result, the residuals are really only similar in periods 5-7.

With one robustness iteration (-Douter.loop=2 in the Java case) the trend picks up much less of the anomaly and the seasonality is much more periodic looking: screen shot 2016-04-13 at 6 33 29 pm Java is heading in a similar direction: screen shot 2016-04-13 at 6 35 31 pm But note that the trend is still off at the origin. This is even more noticeable if we take 5 robustness iterations as the trend should start to lose all of the transient effects and be close to constant: screen shot 2016-04-13 at 6 51 23 pm Note that both the trend and seasonal components appear to have incorrect drift in the first period. After that they look pretty good.

jcrotinger commented 8 years ago

@brandtg @hntd187 I think this is about as close to a minimal criminal as I can come up with. I created a simple two-day pattern:

simple = pd.DataFrame({ 'data' : 20 }, index=pd.date_range("2016-04-16 00:00:00", periods=2*12*24+1, freq="5min"))
simple[(simple.index.hour >= 9) & (simple.index.hour < 18)] = 80

For the Python STL this gives: screen shot 2016-04-14 at 12 05 25 pm For the Java, using

      $ java -Dtrend.bandwidth=0.751 -Dseasonal.bandwidth=0.85 \
             -Douter.loop=1 -Dinner.loop=1 -Dstl.periodic=false \
             -cp $BLAH com.github.brandtg.stl.StlDecomposition 288 \
             art_daily_flatmiddle.csv art_daily_flatmiddle_java_noperiodic_robust.csv

This gives: screen shot 2016-04-14 at 12 06 45 pm

jcrotinger commented 8 years ago

@brandtg @hntd187 The nice thing about minimal criminals is that it is easy to see what should be happening. In this case, the cyclic sub-series should always be exactly constant. However, the way the procedure is running, the "paddedSeries" that is passed to the LoessInterpolator is (for the first point) [0, 20, 20, 20, 0]. It then does a Loess smoothing of this data, using all five points. That is clearly not what should be happening. What should be happening is a Loess smoothing of [20, 20, 20] that is then extrapolated to the padding points. I think that instead of calling smooth on the padded data-set, this can be done by calling interpolate on the un-padded dataset and then using the PolynomialSplineFunction to extrapolate to the padding points.

Digging into the Fortran, they only call the full Loess smoother with the actual sub-cycle points and then call a single-point Loess to clean up the endpoints. I don't think LoessInterpolator could be used this way but it might not be that hard to write the single-point "interpolator" for these extrapolations.

brandtg commented 8 years ago

@jcrotinger Sorry I was out sick for a couple days, so didn't have a chance to get to this. But thank you for the awesome analysis! We have been wondering about this funky behavior, but haven't had too much time to dig into the exact cause.

Your reasoning makes sense. The zero padding at either end of the series is likely causing the weird behavior. Do you want to submit a PR with the fix that you describe? Do you think we can get away with interpolate, or should we write our own single-point interpolator for the endpoint extrapolations?

jcrotinger commented 8 years ago

@brandtg No worries. I'm taking a look at porting the single-point routine from the original RATFOR. :) About have the first cut done. Need to write some tests. Then I'll see about trying to integrate it into the stl-java code. If I can get this single-point version working, then ultimately it would be my preference to bag the dependence on LoessInterpolator and implement something closer to the Fortran. As I mentioned over on the performance thread, without doing the strided smoothing, this is always going to be slow compared to what the Fortran does. (I also want to think about how hard it would be to extend to quadratic, though that can't be recast as a linear operation on the data so it would never be as simple.)

brandtg commented 8 years ago

@jcrotinger Sounds good to me. Removing the dependency on LoessInterpolator would be great.

hntd187 commented 8 years ago

And if we need a native impl wanna jump on that rust hype train? @brandtg choo choo

brandtg commented 8 years ago

@hntd187 Haha I suppose we could try writing something in rust, but if we can get pure Java that would be ideal

brandtg commented 7 years ago

Hey @hntd187 / @jcrotinger I put in a hack to do some kind of extrapolation when padding the cycle subseries, and this is what we get for that input (but with 10 inner loop iterations vs. 1):

with-ten

With 1 it looks a little funky still, but it seems like there's some progress if we can achieve reasonable output with the default config:

just-one

What do you think?

brandtg commented 7 years ago

Hmm... Actually this seems to make the tests fail and really throws off that output... going to dig into that one a bit more, but open to ideas on how we can do the extrapolation part

brandtg commented 7 years ago

Switched back to LoessInterpolator and just tried to handle the edge cases better. Given the limit of LoessInterpolator (can't extrapolate), this is effectively a hack - it just interpolates at the edge and uses that value, assuming it's similar enough.

Here is a run with the following configuration:

    StlDecomposition stl = new StlDecomposition(288);
    stl.getConfig().setTrendComponentBandwidth(0.751);
    stl.getConfig().setSeasonalComponentBandwidth(0.85);
    stl.getConfig().setNumberOfInnerLoopPasses(10);
    stl.getConfig().setPeriodic(false);
    StlResult res = stl.decompose(times, measures);

with-loess-not-periodic

There still seems to be some pattern in the residuals, but they're more balanced and closer to zero than the previous version. Also the tests pass 😅

jcrotinger commented 7 years ago

Hi @brandtg. I'm afraid I gave up on this and "ported" the original Fortran to Java (I cleaned up the interface in the process, though would like to make more improvements - just not enough spare time). I want to give that back but I will have to go through a fair bit of red tape at work in order to be able to put it into the public domain and right now I don't have time. (I also extended it to support second order LOESS, a change I'd also like to put back into the original Fortran.) The experience was enlightening - the version of LOESS in the original Fortran is completely specialized for the use case of implementing the Fortran STL. It assumes evenly spaced data and is written to compute no more than necessary in order to interpolate the point that it is smoothing. Trying to use a generic LOESS package will never perform well, as a result, IMO. In the mean time, if you are not constrained against using unmanaged code, it might be better to do a JNI interface to the original Fortran.

hntd187 commented 7 years ago

Yea, but a JNI interface to that would be a portability nightmare no doubt

jcrotinger commented 7 years ago

Agree - that's why I did a Java version (well, that and other reasons). I've pinged our legal department about steps to release my version. Will let you guys know. Might be a project I could do over the holidays.

jcrotinger commented 7 years ago

@brandtg and @hntd187, at long last my Java port is available on github. Sorry it took so long - a back burner project that had to go through corporate channels since it is based on work I did during my day job. The now-public repo is https://github.com/ServiceNow/stl-decomp-4j. Please take a look, give it a whirl, etc. Performance is about half the speed of the native Fortran, which I think isn't too bad for Java, though I'm not thrilled that it isn't better. Currently it does create some garbage and I'd like to change it to reuse the work arrays that it allocates during the low-pass phase, but the current version will have to do for now.

brandtg commented 7 years ago

@jcrotinger Thanks for the update! Will dig into it - probably will deprecate this project and point towards yours

jcrotinger commented 7 years ago

FYI, stl-decomp-4j is officially release, with version 1.0.0 at a central maven repo near you! I'd like to make a "stl server" variant that would cache its working arrays so that it doesn't generate garbage, which it currently does for the low-pass smoothing, but that will have to wait for a future pause in the action here at work. :)

brandtg / stl-java

Issues comparing with R / Python / Fortran version #9