crf1dc_sm_beta_score optimization

bratao commented 8 years ago

Hello @WladimirSidorenko , thank you for this awesome project !

Do you have any plan of how to optimize the training speed ? I get good improvement with a higher order model, but the training times are prohibitive.

However, it do seems that it have a huge space for improvement. Comparing:

--type=1d and --type=semim -p feature.max_seg_len=1 -p feature.max_order=1

I can see that both generate the same result, but the training time in semim is 30x longer.

With a profiler, I can see that crf1dc_sm_beta_score is responsible for 80% of CPU usage.

I´m trying to optimize it, but do not have enough knowledge of the code yet, and do not known if I have skills to. =(

WladimirSidorenko commented 8 years ago

Hello @bratao ,

It is stated in the warnings section of the README file, that there has been no speed optimization of the new models yet. I'd definitely like to do it, but I surely won't have time for it in the next 1,5 years. The reason for such miserable speed is that I initially tried to implement the algorithm of Nguyen Viet Cuong for the higher-order models, since he made some promising claims in his paper. Unfortunately, because this algorithm was quite difficult to understand both from the paper and from the code and because I did not fully understand it from the very beginning, I chose wrong data structures for looking up the suffixes and prefixes of label sequences (AFAIR, I chose arrays though these better had to be tries). Furthermore, because the $\alpha$, $\beta$ vectors become utterly sparse with the increasing orders, I had to use sparse BLAS vectors instead of plain C arrays for computing them. By replacing these data structures you could significantly make up for the speed especially for the higher-orders. For lower orders, it could also lead to improvements but it probably would be still difficult to hit Okazaki's implementation, since it relies on CPU caching, which could be difficult to use with the Viet Cuong's algorithm.

Kind regards, Wladimir

bratao commented 8 years ago

Hi Wladimir, @WladimirSidorenko

I see. I tried Nguyen Viet Cuong code, but could not reproduce the results. I was very slow.

What do you think it is the lowest hanging fruit/ easiest way to improve the training time for a second order CRFs while keeping the accuracy in your code ?

1 - Revisit the suffixes and prefixes of label sequences to tries 2- In crf1dc_sm_beta_score there are 5 chained loops, do you think there is any step that could be memorized/cached ? 3 - Use a faster training algorithm, like perceptron 4 - Something else ??

WladimirSidorenko commented 8 years ago

Hi, Bruno,

From what I remember, perceptron would definitely be not that hard to implement, and it typically converges much better and faster, and uses two times less computations in each iteration than SGD and l-BFGS. If I were you, I would start with (1) and proceed to (3), because the current way of storing suffixes/prefixes is inherently bad, and relying on it would be problematic too, requiring re-writing the perceptron afterwards anyway, once the affix data structures change.

With (1), you will basically need to rewrite the function calls at some places and also change the (de-)serialization. Instead of a trie, you could also try RumAVL first, if I don't use it yet. AFAIR, I already had a branch for this where I started doing some changes in this direction. If I'll find this branch, I will make it free for you, though I left it in a somewhat broken not compilable state, it could probably save you some work.

bratao commented 8 years ago

@WladimirSidorenko , that would be awesome, I will to study how I would implement the perceptron while I wait to see if you find this RumAVL branch

WladimirSidorenko commented 8 years ago

Hi Bruno,

It's this branch here: https://github.com/WladimirSidorenko/CRFSuite/tree/sm_speedup You can check it out, but, as I said, the latest commit on this branch is not compilable.

WladimirSidorenko / CRFSuite

crf1dc_sm_beta_score optimization #5