alibaba / FederatedScope

An easy-to-use federated learning platform
https://www.federatedscope.io
Apache License 2.0
1.26k stars 206 forks source link

Offsite tuning code with multigpu setting throws error #749

Closed KKNakkav2 closed 5 months ago

KKNakkav2 commented 7 months ago

Dear @rayrayraykk,

I'm trying to run federated offsite tuning code in the multi-gpu setting by setting the parameter federate.process_num

When I set the value of federate.process_num to >=2 in our server with 4 GPUs, I encountered an issue in Client.py within OffsiteTuningClient class. The error is as follows from :

 File "/home/krishna/2024/FederatedScope/federatedscope/core/parallel/parallel_runner.py", line 100, in run
    runner.setup()
  File "/home/krishna/2024/FederatedScope/federatedscope/core/parallel/parallel_runner.py", line 372, in setup
    client.model.to(self.device)
  File "/home/krishna/2024/FederatedScope/federatedscope/core/workers/base_worker.py", line 51, in model
    return self._model

It looks the self._model is deleted at https://github.com/alibaba/FederatedScope/blob/7f086944c57f85c7594bde44d4f6b981f0de6845/federatedscope/llm/offsite_tuning/client.py#L37 within the OffsiteTuningClient class.

Can you please advise how to overcome this issue. Any pointers would also help me to fix the issue myself. Thanks a lot.

KKNakkav2 commented 7 months ago

Can authors please tell if this issue is expected? or is it related to wrong setting in the configuration file. Thank you

rayrayraykk commented 7 months ago

In the current version, the multi-GPU training is not supported with offsite-tuning. Thank you!

KKNakkav2 commented 7 months ago

Thank you for letting me know.