WayneDW / DeepLight_Deep-Lightweight-Feature-Interactions

Accelerating Inference for Recommendation Systems (WSDM'21)
https://arxiv.org/abs/2002.06987
112 stars 23 forks source link

DeepLight paper question - does Table 1 have a typ0? #7

Closed drei34 closed 1 year ago

drei34 commented 1 year ago

Hi,

Thanks for this code! I was looking at the paper: https://arxiv.org/abs/2002.06987 and I see that on Table 1 the shallow layer of DeepFM is O(nk) while that of DeepFwFM is O(n^2k). I am wondering why this is. Aren't the formulas the same except that the DeepFwFM term has an R matrix term? Should they not both be O(n^2k) since both terms are the same except DeepFM has 1's where DeepFwFM has R's?

WayneDW commented 1 year ago

Thanks for your interest. FM is more computationally efficient (if you check the computations carefully), while FwFM doesn't have the speed-ups as in the FM module, which is why the computation is slower than FM.

drei34 commented 1 year ago

Thanks! I actually am unsure why though and I guess I have to press you a little more. I mean the only difference between the formulas in DeepFM and FwFM is the R, right? I mean the shallow parts are the differences so it seems FwFM has another O(n) in complexity but I'm unsure why. It seems that if you take x which is dimension m then only n of the 1/0 elements will be nonzero. An inner product between two vectors takes O(k) time if the dimensions of all vectors is k. The linear part then in FwFM takes O(nk) time while that in DeepFM is O(n). The part requiring the quadratic term in DeepFM is O(k n^2) since there are exactly n^2 / 2 products xixj equal to 1 and <ei, ej> is O(k). FwFM this is exactly the same but we have a term R and this takes O(1) time. What is wrong with my bold statement then?

image image
drei34 commented 1 year ago

I.e. it seems that DeepFM is O(n) + O(n^2 k) while FwFM is O(nk) + O(n^2k) ... So FwFM might be slower but I am unsure where the O(nk) is coming from for DeepFM ...