Closed CopaceticMeatbag closed 1 year ago
Hi @CopaceticMeatbag,
I think you can just pass the model path instead of the model name in the translate method.
Have you tried this and didn't work ?
Thanks @abdeladim-s - I get an error: Unable to infer the model_family from "C:/Users/Administrator/m2m100_1.2B/". Try explicitly setting the value of model_family to "mbart50" or "m2m100"
The code I'm using is as below - I tried several variations on the path string with back/forward slash or raw string, no end slash, etc:
import os
from pathlib import Path
import pysubs2
from subsai import Tools
subtitles_file = 'V:\TestAutoSubs.srt'
subs = pysubs2.load(subtitles_file)
translation_model = 'C:/Users/Administrator/m2m100_1.2B/'
source_language = 'English'
target_language = 'Spanish'
format = 'srt'
translated_file = f"{subtitles_file}-{source_language}-{target_language}.{format}"
translated_subs = Tools.translate(subs, source_language=source_language, target_language=target_language, model=translation_model)
translated_subs.save(translated_file)
print(f"translated file saved to {translated_file}")
I think you will need to pass a Pytorch model : i.e something that ends with pth
! Where did you download the model ?
from the code above, the translation_model
var seems to be a folder!
Ah, I thought we had to point to the whole directory! I downloaded from here: https://huggingface.co/facebook/m2m100_1.2B/tree/main
The script tries to download the pytorch_model.bin file (which is where it crashes), I also tried making the translation_model
point to C:/Users/Administrator/m2m100_1.2B/pytorch_model.bin
with the same results.
From what I can tell looking at dl-translate
, it accepts a path to a directory containing the model files, but I think the integration into subsai bypasses this and uses the translate
function directly. I'll have a bit more of a crack at figuring this out today!
edit: I'm looking at this snippet from dl-translate readme:
By default, dlt.TranslationModel will download the model from the huggingface repo for mbart50 or m2m100 and cache it. It's possible to load the model from a path or a model with a similar format, but you will need to specify the model_family:
mt = dlt.TranslationModel("/path/to/model/directory/", model_family="mbart50") mt = dlt.TranslationModel("facebook/m2m100_1.2B", model_family="m2m100")
yeah basically it should work with the pytorch_model.bin
file
I have checked the dl-translate
source code as well, and from what I can see you just need the model name or the path to a pretrained model,
In subsai
I just take the model var and pass it directly to that TranslationModel
class, so it should be the same!
There is also a create_translation_model which returns an instant of that class, you can use it in the translate
method as well.
oh, so the model_family is not inferred automatically ? Please try the dl-translate
snippet and let me know if it works ?
@CopaceticMeatbag I have added the model_family
to the translate
model, try to reinstall the latest commit and give it a try. Hope it will work now!
You've only gone and bloody done it!! Nice work, all works perfectly :) thank you very much!
I'm trying to use the facebook m2m 1.2B model, but python can't download it in time before it times out and crashes the download. Is it possible to pass a filepath to a model I've already downloaded? It looks like it's possible in dl-translate, but I'm not sure how I'd access that model_or_path feature via the subsai options.