Closed harrisonvanderbyl closed 4 weeks ago
Thanks for you suggestion! Actually I've tried to use your rwkv-onnx as the base. The reason I eventually didn't pick that was:
DynamicQuant
onnx opuint8 quant is handled by tools in QNN SDK, while >2gb models don't have problems when using torch==2.2.0
Can try https://github.com/RWKV/rwkv-onnx for creating onnx graph from scratch without pytorch to facilitate more fine grained control over operators