deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
198 stars 65 forks source link

NeuronX compiler: specify data type #2378

Open CoolFish88 opened 1 month ago

CoolFish88 commented 1 month ago

Description

Currently, the options for Transformers-NeuronX Engine in LMI don't include the possibility to specify the data type for compilation. It would be nice to have this parameter added to the set.

Will this change the current api? How? Yes, a new parameter needs to be added and propagated to the neuron compiler

Who will benefit from this enhancement? Everyone

References

tosterberg commented 1 month ago

Thanks @CoolFish88 - Dtype as a parameter is available for Neuron model compilation and runtime in the form of option.dtype=bf16 in a serving.properties or OPTION_DTYPE=bf16. It does appear that the documentation is not clear on this fact, as it skips over common option and only outlines advanced options. I will make an update to the documentation in this regard. In the near term you can see the options that are available to you here https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/setup/djl_python/properties_manager/tnx_properties.py#L32-L35. This list will expand as the Neuron frameworks support.