How to decrease inference time of LiLT?

NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

MIT License

8.51k stars 1.34k forks source link

How to decrease inference time of LiLT? #284

Open piegu opened 1 year ago

piegu commented 1 year ago

Hi,

I'm using Hugging Face libraries in order to run LiLT. How can I decrease inference time? Which code to use?

I've already try BetterTransformer (Optimum) and ONNX but none of them accepts LiLTmodel.

BetterTransformer: NotImplementedError: The model type lilt is not yet supported to be used with BetterTransformer.
ONNX: KeyError: "lilt is not supported yet.

Thank you.

Note: I asked this question here, too: https://github.com/jpWang/LiLT/issues/42

piegu commented 1 year ago

Issue opened in the Optimum library: https://github.com/huggingface/optimum/issues/1024

bkocis commented 1 year ago

Have you considered making a smaller model? What is your model size?

NielsRogge commented 1 year ago

One thing you can try (especially if you're using a multilingual model like https://huggingface.co/nielsr/lilt-xlm-roberta-base), then you can remove token embeddings of tokens of languages that you don't need.

See this blog post for more info: https://medium.com/@coding-otter/reduce-your-transformers-model-size-by-removing-unwanted-tokens-and-word-embeddings-eec08166d2f9