AceCentre / AACSpeakHelper

Copies the pasteboard. Translates to defined lang, Reads aloud and replaces pasteboard with translated text
https://docs.acecentre.org.uk/products/v/aac-speak-helper-tool/
MIT License
0 stars 1 forks source link

Investigate offline translation #8

Closed willwade closed 1 year ago

willwade commented 1 year ago

See MarianMT

install PyTorch, then pip install sentencepiece and huggingface, the you can run:

from transformers import MarianMTModel, MarianTokenizer
from typing import Sequence

class Translator:
    def __init__(self, source_lang: str, dest_lang: str) -> None:
        self.model_name = f'Helsinki-NLP/opus-mt-{source_lang}-{dest_lang}'
        self.model = MarianMTModel.from_pretrained(self.model_name)
        self.tokenizer = MarianTokenizer.from_pretrained(self.model_name)

    def translate(self, texts: Sequence[str]) -> Sequence[str]:
        tokens = self.tokenizer(list(texts), return_tensors="pt", padding=True)
        translate_tokens = self.model.generate(**tokens)
        return [self.tokenizer.decode(t, skip_special_tokens=True) for t in translate_tokens]

marian_ru_en = Translator('ru', 'en')
marian_ru_en.translate(['что слишком сознавать — это болезнь, настоящая, полная болезнь.'])
# Returns: ['That being too conscious is a disease, a real, complete disease.']

NB:

willwade commented 1 year ago

This looks promising too https://github.com/facebookresearch/seamless_communication and an on-device model which is around 230Mb https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/on_device_README.md

willwade commented 1 year ago

closing this for now. too tricky