opencl backend only supports a number of harmonics (k) in range(3,11)

diku-dk / bfast

GPU Implementation for BFAST

GNU General Public License v3.0

37 stars 17 forks source link

opencl backend only supports a number of harmonics (k) in range(3,11) #41

Open mirt001 opened 3 years ago

mirt001 commented 3 years ago

Probably nobody needs k>10, but k<3 are needed.

What surprised me even more is that the limitation only exists in the opencl backend. The only place where k is used in an unsafe way, imo, is on line 88 in the futhark code, but I couldn't understand what is ns, and why it can't be what I assume to be 3, or 5.

Interestingly enough, the corresponding code in the "python" backend, is the same, with the exception that it doesn't differentiate between trend and no trend for the value of k2p2, and there's no limitation for k there.

It would be interesting for me to also understand why the python and opencl implementation calculate sigma slightly differently, perhaps pointing me to some literature would suffice.

But the core issue is still for the opencl implementation to work with k=1 or k=2. If this doesn't make mathematical or technical sense, it should probably be explained in the documentation.

mirt001 commented 3 years ago

Moreover, I just tested !pip install git+https://github.com/mirt001/bfast.git@mirt001-testing-k-1-2#egg=bfast branch on my fork, which comments the lines that check whether k is in k_valid.

It works with seemingly correct results on google colab, and I assume nothing exploded on google side of things.

This is far from proper testing, but whatever prompted imposing the k>2 limitation, might not be in the code anymore.

mortvest commented 3 years ago

This limitation has been there from before I joined the project, so I don't know exactly why it is there. I think it has something to do with performance of the GPU version. 2k + 1 is the inner dimension of multiple vector/matrix operations in the code, and the GPU kernels are tuned with this assumption in mind. It should still run for k=1 and k=2, but the performance would probably be suboptimal. More testing needs to be done.

mirt001 commented 3 years ago

@mortvest I understand that performance might be suboptimal, when compared to k=3, but the performance is still better than the R bfast. Also, python and opencl backends are at least comparable with k=1, and I believe opencl is still faster. At most, the performance decrease for k=1 and k=2 for opencl should be documented and left to the user. Especially since, imo, most users need specifically k=1 and k=2.

What do you have in mind for testing? I have to run bfastmonitor quite a few times these days. If I can watch out for something extra that would help with development, that would be great. I am currently running my fork that allows me to run with k=1 and k=2. No issues so far.

mortvest commented 3 years ago

Thanks for your input @mirt001, I didn't know that it was a popular setup. By testing, I meant how much performance decrease there is and if it actually produces the correct results. The latter is probably true. Regarding the former, we would probably need to retune some parameters for the GPU version. I'll look into it next week.