aws / sagemaker-huggingface-inference-toolkit

Apache License 2.0
235 stars 60 forks source link

[Feature Request] Support Japanese language #18

Open AtsunoriFujita opened 3 years ago

AtsunoriFujita commented 3 years ago

In some cases, dedicated libraries(e.g. fugashi, ipadic) are required for Japanese tokenizers. Currently, these libraries are not included in the inference container. Is it possible to include these libraries or to have an option in the transformers installation?

For example, if we can rewrite the Dockerfile like this, we can handle it. transformers[sentencepiece]transformers[ja]

Currently, if we deploy from S3, we can work around it with requirements.txt and an empty inference.py, but if we deploy from HF Hub, we don't have a workaround.

Thanks!

philschmid commented 3 years ago

@AtsunoriFujita thank you for the feature request. We are going to look into it.