Closed hanhanpp closed 2 days ago
Correct, the gradient update mechanism is modified based on the AdamW optimizer, and supports the alternating update of quantization-related parameters such as zeros and scales and weight. The update for scales will be released in the upcoming version.
The DiodeMix provide different update strategies for different bit parameters. But I find that only 1-bit parameter's update method is spcified, others, e.g. 8-bit parameter, are same as Adam optimzer. Is that right?