facebookresearch / InferSent

InferSent sentence embeddings
Other
2.28k stars 470 forks source link

Parallel Model on 40 million sentences #59

Closed limiao2 closed 6 years ago

limiao2 commented 6 years ago

I have a big corpus which includes roughly 40 million sentences. Is there any way that I can run this model in parallel? Is it better for me to divide the corpus to chunks and then eventually concatenate all matrix together? Thanks!

aconneau commented 6 years ago

Hi, if you have multiple GPUs, I think that the solution you propose is indeed the best one. Just separate your large corpus of 40M sentences to smaller chunks and run InferSent individually on this. Sorting the data so that you minimize the amount of padding and using large batches will also help here. Best, Alexis