ROCm / AMDMIGraphX

AMD's graph optimization engine.
https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/
MIT License
182 stars 84 forks source link

Weight Stripping #3207

Open hgaspar opened 2 months ago

hgaspar commented 2 months ago

Enable creating engines (currently, MXR files, eventually perhaps dynamic objects) without embedding the weights in the engine.

Use cases: (1) Support compilation for various batch sizes without duplicating the weights. (2) Support multiple execution configurations with different quantization options (including mixed precision), without necessarily having to embed the weights in all the created engines. (3) multi-GPU execution may benefit from this also, especially when it comes to creating multiple multiGPU execution configurations (partitions, execution schedules)

Technical considerations: How do we treat literals? Perhaps we need to have the MXR files contain the steps required to recreate the literals from the weights' file, and that may require a new type ( finalized lliterals vs future literal or meta-literal) 

eddieliao commented 2 months ago

Looking to work on this as an extension of weight streaming; do we have a specific format already for a weights file or is that something that needs to be decided?

eddieliao commented 1 month ago

List of items for basic proof of concept:

eddieliao commented 1 month ago

Replaced current literals with a fetch_literals dummy instruction that contains no data. Greatly reduces .mxr size, although still need to investigate why read of the model fails.

eddieliao commented 1 month ago

Added the ability to write and save weights in the strip_weights pass. Need to figure out how to pass output location to the pass (remove hard-coded location).

simberg-amd commented 2 weeks ago

Fixed issue with writing weights and added test that successfully reads weights from file and adds weight back to MXR file

simberg-amd commented 2 weeks ago
simberg-amd commented 2 weeks ago

Finished above, going to look into different quantization options.