Loss / metric curves - Githubissues

OpenMotionLab / MotionGPT

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

https://motion-gpt.github.io

MIT License

1.46k stars 91 forks source link

Loss / metric curves #59

Open sanjayss34 opened 10 months ago

sanjayss34 commented 10 months ago

Could the authors or someone else who has successfully reproduced the training of MotionGPT share your train loss and R_TOP_3 (or R_TOP_1/R_TOP_2) curves so I can see if my training is going as it should be? After about 20 epochs, RTOP{1/2/3} are basically flat (R_TOP_3 < 0.1), though the training loss is decreasing.

YU1ut commented 10 months ago

I am facing the same problem.

YU1ut commented 7 months ago

I found this is due to the multi-GPU during training. Change https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L76 to self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat") and https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135 to all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :] Do the same thing to text_embeddings and gtmotion_embeddings, then the problem can be solved.

lixiang927047 commented 7 months ago

I found this is due to the multi-GPU during training. Change

https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L76

to self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat") and https://github.com/OpenMotionLab/MotionGPT/blob/fac297260a0e7138ce04f8b41c2e7b24e1f09a9a/mGPT/metrics/t2m.py#L134-L135

to all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :] Do the same thing to text_embeddings and gtmotion_embeddings, then the problem can be solved.

hello,I had the same problem. could you point out which other lines need to make similar changes? Because I still have similar problems after I modified it.

YU1ut commented 7 months ago

@lixiang927047

        # Chached batches
        self.add_state("text_embeddings", default=[], dist_reduce_fx="cat")
        self.add_state("recmotion_embeddings", default=[], dist_reduce_fx="cat")
        self.add_state("gtmotion_embeddings", default=[], dist_reduce_fx="cat")

and

        if type(self.recmotion_embeddings) == list:
            all_genmotions = torch.cat(self.recmotion_embeddings,
                                   axis=0).cpu()[shuffle_idx, :]
        else:
            all_genmotions = self.recmotion_embeddings.cpu()[shuffle_idx, :]

        if type(self.gtmotion_embeddings) == list:
            all_gtmotions = torch.cat(self.gtmotion_embeddings,
                                    axis=0).cpu()[shuffle_idx, :]
        else:
            all_gtmotions = self.gtmotion_embeddings.cpu()[shuffle_idx, :]

        # Compute text related metrics
        if self.text:
            if type(self.text_embeddings) == list:
                all_texts = torch.cat(self.text_embeddings, axis=0).cpu()[shuffle_idx, :]
            else:
                all_texts = self.text_embeddings.cpu()[shuffle_idx, :]