cisnlp / simalign

Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
MIT License
345 stars 47 forks source link

Any workarounds for pytorch being too large? #32

Closed creolio closed 1 year ago

creolio commented 2 years ago

I've implemented simalign into my heroku app, but as soon as I've done so, I get the "Compiled slug size: 1G is too large (max is 500M)." error.

Reviewing the files, I can see that pytorch alone is responsible for 600MB of memory, which simalign depends on. I also tried Digital Ocean, but seem to be running into similar issues.

Any ideas for working around this to reduce the memory consumption required by simalign because of pytorch?

pdufter commented 2 years ago

Hi @creolio, SimAlign requires the weights of pretrained models. From the description it seems that the majority of the 600MB might be the downloaded models? If yes, an alternative would be to use smaller multilingual models (e.g., distillled versions).