Closed chenkai-666 closed 2 years ago
It's really weird... In trainer.py
, we have these lines of code:
self.model = nn.parallel.DistributedDataParallel(
self.model,
device_ids=[self.args.local_rank],
output_device=self.args.local_rank,
find_unused_parameters=self.args.ddp_find_unused_parameters,
)
So self.model
should be an nn.parallel.DistributedDataParallel
instance instead of an nn.Module
instance; however, in your log, it is an nn.Module
instance. I haven't encountered this bug; maybe it is related to CPU/GPU settings.
A quick solution is: in trainer.py
there are two mentions of self.model.module
, and you can change them to None
because they are not used anyway. Also, as I mentioned in #6, a version based on the diffusers library will be available soon, which will be easier to use. Hope this is helpful!
Thank you for your excellent work. I have encountered some problems. Can you tell me how to correct them?