facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.43k stars 6.4k forks source link

How to extract_feature with XLMR using multiple GPU #2020

Closed YeWenting closed 4 years ago

YeWenting commented 4 years ago

What is your question?

How to extract_feature with XLMR using multiple GPU? If I use the following code with DataParallel() I can only utilize a single GPU..

Code

xlmr = torch.hub.load('pytorch/fairseq', 'xlmr.large') xlmr.eval() xlmr = xlmr.cuda() xlmr = nn.DataParallel(xlmr)

with open("XLMR_embedding.txt", "w") as out_f: for data in tqdm(loader): for key in data: data[key] = data[key].cuda()

    last_layer_embedding = xlmr.module.extract_features(data["encoding"])
lematt1991 commented 4 years ago

CC @ngoyal2707 , is this supported?

myleott commented 4 years ago

This is not really supported. The easiest way would be to use python multiprocessing and load a separate model instance on each GPU.

The reason nn.DataParallel won’t work is that (1) you’d need to feed a batch of inputs, (2) nn.DataParallel only works if you call forward, but here you’re calling extract_features. You can see how it’s implemented here: https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/data_parallel.py#L141