A syntax error in __call_model

Clayfigure commented 1 year ago

In the __call_model funtion in lamorel/caller.py, you set object_gather_list=None. However, it is not allowed in torch.dist.

ClementRomac commented 1 year ago

Hi,

Did you get an error? What is the version of Pytorch you are using?

Clayfigure commented 1 year ago

i am using 1.9.0+cu111，and error message says: argument "gather_list" must be specified on destination rank. Also, I am confused that what is the point to gather all information in None in the llm master process？

ClementRomac commented 1 year ago

As per Pytorch 1.9.0's documentation (https://pytorch.org/docs/1.9.0/distributed.html), the torch.distributed.gather_object method still takes an object_gather_list argument. So I don't get why you have this error.

Concerning the None, the object_gather_list argument is used to specify the variable on which to gather (on the destination process) all obj passed by the other processes. So each process sending an obj has no need to specify an object_gather_list. On the contrary, the destination process (here self._llm_master_process) does not specify any obj but does give an object_gather_list (as it is receiving objects but not sending one). You can find the destination process' code here.

flowersteam / lamorel

A syntax error in __call_model #19