EricGuo5513 / momask-codes

Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
https://ericguo5513.github.io/momask/
MIT License
689 stars 56 forks source link

"array must not contain infs or NaNs" when training/evaluating vqvae #29

Open Keneyr opened 3 months ago

Keneyr commented 3 months ago
  1. I tried to train train_vq.py, but got error like array must not contain infs or NaNs, the call stack is:
    
    # vq_trainer.py
    best_fid, best_div, best_top1, best_top2, best_top3, best_matching, writer = evaluation_vqvae(
            self.opt.model_dir, eval_val_loader, self.vq_model, self.logger, epoch, best_fid=1000,
            best_div=100, best_top1=0,
            best_top2=0, best_top3=0, best_matching=100,
            eval_wrapper=eval_wrapper, save=False)

eval_t2m.py

diversity_real = calculate_diversity(motion_annotation_np, 300 if nb_sample > 300 else 100)

metrics.py

dist = linalg.norm(activation[first_indices] - activation[second_indices], axis=1)

it seems like `activation[first_indices]` has `nan` elements.

I used `numpy.nan_to_num()` to avoid the error, but will it affect my training effect?

2. I tried to run `eval_t2m_vq.py` following `README`, which means I was using the evaluation model downloaded from Google(given by the repo author), also got the same error, the call stack is:
```python
# eval_t2m_vq.py
best_fid, best_div, Rprecision, best_matching, l1_dist = \
                eval_t2m.evaluation_vqvae_plus_mpjpe(eval_val_loader, net, i, eval_wrapper=eval_wrapper, num_joint=args.nb_joints)

# eval_t2m.py
diversity = calculate_diversity(motion_pred_np, 300 if nb_sample > 300 else 100)

# metrics.py
dist = linalg.norm(activation[first_indices] - activation[second_indices], axis=1)

What should I do? Thank you!

imzeroan commented 3 months ago

There is a closed issue figured it out. Just find a humanml3d data which has nan and delete it. There were only *007975.npy has nan.

Keneyr commented 3 months ago

yeah, thank you :). It seems 007975.npy is the reason, but I am still confused, why this file is failed?