facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.53k stars 1.02k forks source link

Specify model path for offline loading #50

Closed jgwinnup closed 10 months ago

jgwinnup commented 10 months ago

Is it possible to specify a path for the model checkpoint to facilitate offline loading?

cndn commented 10 months ago

Hey @jgwinnup - Good call, for now you could override path in asset cards e.g. https://github.com/facebookresearch/seamless_communication/blob/main/src/seamless_communication/assets/cards/seamlessM4T_large.yaml#L10, with a prefix "file://" to indicate local path.

Also paths are cached so you should only need to download it once anyway.

jgwinnup commented 10 months ago

Thanks for the pointer - when I specify a file:// URI I get the following error:

fairseq2.assets.card.AssetCardError: The value of the field 'checkpoint' of the asset card 'seamlessM4T_large' must be a valid URI, but is 'file:///foo/bar/seamless-m4t-large/multitask_unity_large.pt' instead.

I can work around this by starting up a python http.server and serve the file locally, but it would be nice to have proper file:// URI support.

cndn commented 10 months ago

Hey @jgwinnup double checking the logic here it should work https://github.com/facebookresearch/fairseq2/blob/main/src/fairseq2/assets/download_manager.py#L132. Could you print out pathname here and see if it's expected?

efwfe commented 10 months ago

~/.cache is the cache path, after the translator download all the file. cioy the file to the offline same location can reload it. Hope this may be helpful.

jgwinnup commented 10 months ago

Hi - I filed a similar issue on fairseq2 facebookresearch/fairseq2#6, looks like the issue is that I was specifying an absolute path with a leading slash (e.g. checkpoint: 'file:///foo/bar/model.pth') but was asked to just use two slashes (checkpoint: 'file://foo/bar/model.pth') and that worked.