Uint8 and >2gb models - Githubissues

MollySophia / rwkv-qualcomm

Inference rwkv5 or rwkv6 with Qualcomm AI Engine Direct SDK

37 stars 3 forks source link

Closed harrisonvanderbyl closed 4 weeks ago

harrisonvanderbyl commented 7 months ago

Can try https://github.com/RWKV/rwkv-onnx for creating onnx graph from scratch without pytorch to facilitate more fine grained control over operators

MollySophia commented 7 months ago

Thanks for you suggestion! Actually I've tried to use your rwkv-onnx as the base. The reason I eventually didn't pick that was:

uint8 quant is handled by tools in QNN SDK, while >2gb models don't have problems when using torch==2.2.0