Open sanjayss34 opened 10 months ago
I am facing the same problem.
I found this is due to the multi-GPU during training.
Change
https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L76
to
self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
and
https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135
to
all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]
Do the same thing to text_embeddings
and gtmotion_embeddings
, then the problem can be solved.
I found this is due to the multi-GPU during training. Change
to
self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
and https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135to
all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]
Do the same thing totext_embeddings
andgtmotion_embeddings
, then the problem can be solved.
hello,I had the same problem. could you point out which other lines need to make similar changes? Because I still have similar problems after I modified it.
@lixiang927047
# Chached batches
self.add_state("text_embeddings", default=[], dist_reduce_fx="cat")
self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
self.add_state("gtmotion_embeddings", default=[], dist_reduce_fx="cat")
and
if type(self.recmotion_embeddings) == list:
all_genmotions = torch.cat(self.recmotion_embeddings,
axis=0).cpu()[shuffle_idx, :]
else:
all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]
if type(self.gtmotion_embeddings) == list:
all_gtmotions = torch.cat(self.gtmotion_embeddings,
axis=0).cpu()[shuffle_idx, :]
else:
all_gtmotions = self.gtmotion_embeddings.cpu()[shuffle_idx, :]
# Compute text related metrics
if self.text:
if type(self.text_embeddings) == list:
all_texts = torch.cat(self.text_embeddings, axis=0).cpu()[shuffle_idx, :]
else:
all_texts = self.text_embeddings.cpu()[shuffle_idx, :]
Could the authors or someone else who has successfully reproduced the training of MotionGPT share your train loss and R_TOP_3 (or R_TOP_1/R_TOP_2) curves so I can see if my training is going as it should be? After about 20 epochs, RTOP{1/2/3} are basically flat (R_TOP_3 < 0.1), though the training loss is decreasing.