An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivation. It is probably the code which is the most close to selective_scan_cuda in mamba.
this project is amazing!
i change the SelectiveScanOflex to SelectiveScanEasy for cpu inference, but the result have Nan,
i found it cause by tmp_dtBus_div_rAts = (dtBus / rAts), the rAts have zero value, how to fix this?
this project is amazing! i change the SelectiveScanOflex to SelectiveScanEasy for cpu inference, but the result have Nan, i found it cause by
tmp_dtBus_div_rAts = (dtBus / rAts)
, the rAts have zero value, how to fix this?