Open kkkkkk123-ops opened 2 months ago
Hi! The subtraction in the interpolation in vrwkv6 gives a formula equivalent to that in VRWKV. VRWKV6: xx = shift(x) - x xxx = x + \mu xx = x + \mu (shift(x) - x) = (1 - \mu) x + \mu shift(x)
VRWKV: xx = shift(x) xxx = \mu x + (1 - \mu) xx = \mu x + (1 - \mu) shift(x)
in the jit_func in Class VRWKV_SpatialMix_V6 why we need to -x after the shift_func? It seems not -x in the _inner_forward in Class VRWKV_ChannelMix when calculate xx.
Class VRWKV_SpatialMix_V6
def jit_func(self, x, patch_resolution):
Mix x with the previous timestep to produce xk, xv, xr
Class VRWKV_ChannelMix