Ki6an / fastT5

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Apache License 2.0
564 stars 72 forks source link

Upgrade ONNX runtime #63

Open dandiep opened 1 year ago

dandiep commented 1 year ago

Is it possible to upgrade the ONNX runtime dependency to the latest? The old version has some bugs (e.g. doesn't work on aws lambda arm64)

ka00ri commented 1 year ago

same issue encountered while installing on a m1 device. To install, I upgraded the version in setup.py and to run it, I had to remove the parameter activation_type in quantize_dynamic().

        quantize_dynamic(
            model_input=model_name,
            model_output=output_model_name,
            per_channel=True,
            reduce_range=True, # should be the same as per_channel
            # activation_type=QuantType.QUInt8,
            weight_type=QuantType.QInt8,  # per docs, signed is faster on most CPUs
            optimize_model=False,
        )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],