Open CDLCHOI opened 2 months ago
Hi, Here is the log before the code cleanup. I am re-training to verify again.
2023-10-12 10:11:15,494 INFO {
"batch_size": 512,
"block_size": 51,
"clip_dim": 512,
"code_dim": 32,
"dataname": "t2m",
"decay_option": "all",
"depth": 3,
"dilation_growth_rate": 3,
"down_t": 2,
"drop_out_rate": 0.1,
"embed_dim_gpt": 1024,
"eval_iter": 5000,
"exp_name": "HML3D_45_crsAtt1lyr_40breset",
"ff_rate": 4,
"fps": [
20
],
"gamma": 0.05,
"if_maxtest": false,
"lr": 0.0001,
"lr_scheduler": [
37500
],
"mu": 0.99,
"n_head_gpt": 16,
"nb_code": 8192,
"num_layers": 9,
"optimizer": "adamw",
"out_dir": "/home/epinyoan/git/MaskText2Motion/T2M-BD/output/t2m/2023-10-12-10-11-15_HML3D_45_crsAtt1lyr_40breset/",
"output_emb_width": 512,
"pkeep": 0.5,
"print_iter": 200,
"quantbeta": 1.0,
"quantizer": "ema_reset",
"resume_pth": "/home/epinyoan/git/MaskText2Motion/T2M-BD/output/vq/2023-07-19-04-17-17_12_VQVAE_20batchResetNRandom_8192_32/net_last.pth",
"resume_trans": null,
"seed": 123,
"seq_len": 64,
"stride_t": 2,
"total_iter": 75000,
"vq_act": "relu",
"vq_dir": "/home/epinyoan/git/MaskText2Motion/T2M-BD/output/vq/2023-07-19-04-17-17_12_VQVAE_20batchResetNRandom_8192_32",
"vq_name": "2023-07-19-04-17-17_12_VQVAE_20batchResetNRandom_8192_32",
"warm_up_iter": 1000,
"weight_decay": 1e-06,
"width": 512
}
2023-10-12 12:17:31,567 INFO --> Eva. Iter 5000 :,
FID. 0.7736 ,
Diversity Real. 9.4222,
Diversity. 9.4605,
R_precision_real. [0.51396277 0.70146277 0.78856383],
R_precision. [0.48271277 0.67021277 0.76462766],
matching_score_real. 2.903815122360879,
matching_score_pred. 3.2019664936877312,
multimodality. 0.0000
2023-10-12 12:17:31,568 INFO --> --> FID Improved from 1000.00000 to 0.77361 !!!
2023-10-12 12:17:31,568 INFO --> --> matching_score Improved from 100.00000 to 3.20197 !!!
2023-10-12 12:17:31,569 INFO --> --> Diversity Improved from 100.00000 to 9.46046 !!!
2023-10-12 12:17:31,569 INFO --> --> Top1 Improved from 0.0000 to 0.4827 !!!
2023-10-12 12:17:31,569 INFO --> --> Top2 Improved from 0.0000 to 0.6702 !!!
2023-10-12 12:17:31,569 INFO --> --> Top3 Improved from 0.0000 to 0.7646 !!!
2023-10-12 14:23:16,358 INFO --> Eva. Iter 10000 :,
FID. 0.2061 ,
Diversity Real. 9.2730,
Diversity. 9.6214,
R_precision_real. [0.52859043 0.71476064 0.80851064],
R_precision. [0.49069149 0.72074468 0.81515957],
matching_score_real. 2.8495587592429303,
matching_score_pred. 2.8929001118274447,
multimodality. 0.0000
2023-10-12 14:23:16,358 INFO --> --> FID Improved from 0.77361 to 0.20613 !!!
2023-10-12 14:23:16,358 INFO --> --> matching_score Improved from 3.20197 to 2.89290 !!!
2023-10-12 14:23:16,359 INFO --> --> Top1 Improved from 0.4827 to 0.4907 !!!
2023-10-12 14:23:16,359 INFO --> --> Top2 Improved from 0.6702 to 0.7207 !!!
2023-10-12 14:23:16,359 INFO --> --> Top3 Improved from 0.7646 to 0.8152 !!!
2023-10-12 16:28:48,441 INFO --> Eva. Iter 15000 :,
FID. 0.1951 ,
Diversity Real. 9.5439,
Diversity. 9.8876,
R_precision_real. [0.51396277 0.70212766 0.81117021],
R_precision. [0.50797872 0.71409574 0.79720745],
matching_score_real. 2.8992625551020845,
matching_score_pred. 2.8141087217533842,
multimodality. 0.0000
2023-10-12 16:28:48,442 INFO --> --> FID Improved from 0.20613 to 0.19505 !!!
2023-10-12 16:28:48,443 INFO --> --> matching_score Improved from 2.89290 to 2.81411 !!!
2023-10-12 16:28:48,443 INFO --> --> Top1 Improved from 0.4907 to 0.5080 !!!
2023-10-12 18:34:06,733 INFO --> Eva. Iter 20000 :,
FID. 0.1696 ,
Diversity Real. 9.7836,
Diversity. 9.7076,
R_precision_real. [0.52726064 0.71476064 0.79920213],
R_precision. [0.5206117 0.71276596 0.81316489],
matching_score_real. 2.8873865046399705,
matching_score_pred. 2.8751501681956837,
multimodality. 0.0000
2023-10-12 18:34:06,734 INFO --> --> FID Improved from 0.19505 to 0.16957 !!!
2023-10-12 18:34:06,734 INFO --> --> Diversity Improved from 9.46046 to 9.70763 !!!
2023-10-12 18:34:06,734 INFO --> --> Top1 Improved from 0.5080 to 0.5206 !!!
2023-10-12 20:39:25,504 INFO --> Eva. Iter 25000 :,
FID. 0.1105 ,
Diversity Real. 9.6543,
Diversity. 9.7581,
R_precision_real. [0.50930851 0.69946809 0.78523936],
R_precision. [0.51795213 0.70545213 0.79654255],
matching_score_real. 2.9209979138475783,
matching_score_pred. 2.8628295431745814,
multimodality. 0.0000
2023-10-12 20:39:25,505 INFO --> --> FID Improved from 0.16957 to 0.11046 !!!
2023-10-12 22:44:55,688 INFO --> Eva. Iter 30000 :,
FID. 0.1699 ,
Diversity Real. 9.3835,
Diversity. 9.3433,
R_precision_real. [0.5099734 0.70013298 0.79853723],
R_precision. [0.52393617 0.70146277 0.80851064],
matching_score_real. 2.86837160333674,
matching_score_pred. 2.8496022833154555,
multimodality. 0.0000
2023-10-12 22:44:55,689 INFO --> --> Diversity Improved from 9.70763 to 9.34333 !!!
2023-10-12 22:44:55,689 INFO --> --> Top1 Improved from 0.5206 to 0.5239 !!!
2023-10-13 00:50:39,065 INFO --> Eva. Iter 35000 :,
FID. 0.1504 ,
Diversity Real. 9.4453,
Diversity. 9.8359,
R_precision_real. [0.5099734 0.70079787 0.80984043],
R_precision. [0.50132979 0.69946809 0.7918883 ],
matching_score_real. 2.9201048739412996,
matching_score_pred. 2.9048643112182617,
multimodality. 0.0000
2023-10-13 02:56:09,232 INFO --> Eva. Iter 40000 :,
FID. 0.1146 ,
Diversity Real. 9.6758,
Diversity. 9.6304,
R_precision_real. [0.52460106 0.70611702 0.79654255],
R_precision. [0.54787234 0.71010638 0.80851064],
matching_score_real. 2.9239063770213027,
matching_score_pred. 2.7912252811675375,
multimodality. 0.0000
2023-10-13 02:56:09,233 INFO --> --> matching_score Improved from 2.81411 to 2.79123 !!!
2023-10-13 02:56:09,233 INFO --> --> Diversity Improved from 9.34333 to 9.63041 !!!
2023-10-13 02:56:09,233 INFO --> --> Top1 Improved from 0.5239 to 0.5479 !!!
2023-10-13 05:01:32,842 INFO --> Eva. Iter 45000 :,
FID. 0.1257 ,
Diversity Real. 9.6170,
Diversity. 9.7833,
R_precision_real. [0.51462766 0.69082447 0.79787234],
R_precision. [0.53125 0.73271277 0.82978723],
matching_score_real. 2.914106272636576,
matching_score_pred. 2.757748162492793,
multimodality. 0.0000
2023-10-13 05:01:32,843 INFO --> --> matching_score Improved from 2.79123 to 2.75775 !!!
2023-10-13 05:01:32,843 INFO --> --> Top2 Improved from 0.7207 to 0.7327 !!!
2023-10-13 05:01:32,843 INFO --> --> Top3 Improved from 0.8152 to 0.8298 !!!
2023-10-13 07:06:57,419 INFO --> Eva. Iter 50000 :,
FID. 0.1295 ,
Diversity Real. 9.4078,
Diversity. 9.3965,
R_precision_real. [0.51595745 0.70013298 0.79454787],
R_precision. [0.52526596 0.7287234 0.81981383],
matching_score_real. 2.8886835473649044,
matching_score_pred. 2.776045175308877,
multimodality. 0.0000
2023-10-13 07:06:57,420 INFO --> --> Diversity Improved from 9.63041 to 9.39650 !!!
2023-10-13 09:12:30,226 INFO --> Eva. Iter 55000 :,
FID. 0.1354 ,
Diversity Real. 9.4323,
Diversity. 9.9881,
R_precision_real. [0.50797872 0.70545213 0.79454787],
R_precision. [0.52726064 0.71941489 0.81183511],
matching_score_real. 2.9032556199012918,
matching_score_pred. 2.848728185004376,
multimodality. 0.0000
2023-10-13 11:18:25,270 INFO --> Eva. Iter 60000 :,
FID. 0.1618 ,
Diversity Real. 9.4185,
Diversity. 9.8101,
R_precision_real. [0.51728723 0.71143617 0.80518617],
R_precision. [0.53856383 0.73404255 0.81981383],
matching_score_real. 2.9068394011639533,
matching_score_pred. 2.769266184340132,
multimodality. 0.0000
2023-10-13 11:18:25,270 INFO --> --> Top2 Improved from 0.7327 to 0.7340 !!!
2023-10-13 13:24:16,032 INFO --> Eva. Iter 65000 :,
FID. 0.1228 ,
Diversity Real. 9.4212,
Diversity. 9.4895,
R_precision_real. [0.51928191 0.6974734 0.79321809],
R_precision. [0.52726064 0.73138298 0.82579787],
matching_score_real. 2.9037675553179803,
matching_score_pred. 2.7725830686853286,
multimodality. 0.0000
2023-10-13 15:29:53,293 INFO --> Eva. Iter 70000 :,
FID. 0.0969 ,
Diversity Real. 9.9374,
Diversity. 9.5377,
R_precision_real. [0.53457447 0.71343085 0.80385638],
R_precision. [0.52526596 0.73271277 0.82845745],
matching_score_real. 2.8475664473594504,
matching_score_pred. 2.8103004364257163,
multimodality. 0.0000
2023-10-13 15:29:53,294 INFO --> --> FID Improved from 0.11046 to 0.09693 !!!
2023-10-13 15:29:53,294 INFO --> --> Diversity Improved from 9.39650 to 9.53765 !!!
2023-10-13 18:34:33,146 INFO --> Eva. Iter 75000 :,
FID. 0.0914 ,
Diversity Real. 9.6473,
Diversity. 9.7987,
R_precision_real. [0.50560345 0.70150862 0.79935345],
R_precision. [0.51198276 0.70967672 0.80681034],
matching_score_real. 2.9873770861790097,
matching_score_pred. 2.909082286440093,
multimodality. 1.1684
2023-10-13 18:34:33,147 INFO --> --> FID Improved from 0.09693 to 0.09137 !!!
2023-10-13 18:34:34,787 INFO Train. Iter 75000 : FID. 0.09137, Diversity. 9.5377, TOP1. 0.5479, TOP2. 0.7340, TOP3. 0.8298
Thanks. I load your trans.pth , and finetune with lr=1e-4, at the begining of training, loss≈2.2, acc≈60%。
But I think my model is not overtrained. I train from scratch again with your code in https://github.com/exitudio/MMM/issues/6 . python3 train_t2m_trans.py \ --exp-name validation_train \ --batch-size 128 \ --vq-name pretrain \ --out-dir output/test \ --total-iter 300000 \ --lr-scheduler 150000 \ --dataname t2m \ --eval-iter 20000
and this is part of my log(the code didn't eval ) 2024-04-19 09:52:15,403 INFO Train. Iter 10 : Loss. 8.96595,acc:0.58339 2024-04-19 10:13:31,970 INFO Train. Iter 4000 : Loss. 5.02665,acc:10.64897 2024-04-19 10:34:56,329 INFO Train. Iter 8000 : Loss. 3.43890,acc:25.89595 2024-04-19 10:56:53,287 INFO Train. Iter 12000 : Loss. 2.06721,acc:43.09161 2024-04-19 11:46:31,417 INFO Train. Iter 20000 : Loss. 1.05492,acc:66.60687 2024-04-19 12:07:59,340 INFO Train. Iter 24000 : Loss. 1.05645,acc:68.19553 2024-04-19 12:40:16,859 INFO Train. Iter 30000 : Loss. 0.82608,acc:77.09264 2024-04-19 15:22:51,580 INFO Train. Iter 32290 : Loss. 0.75970,acc:69.74814
and this is part of log finetuning with "resume_trans": "pretrain/trans.pth", "resume_pth": "./output/vq/vq_name/net_last.pth", "resume_trans": "pretrain/trans.pth", "root_dist_loss": false, "root_loss_no_vel_rot": false, "save_iter": 2000, "seed": 123, "seq_len": 64, "stride_t": 2, "temporal_complete": 0.0, "text": null, "total_iter": 30000, "traj_supervise": false, "vq_act": "relu", "vq_dir": "./output/vq/vq_name", "vq_name": "output/vq/vq_name", "warm_up_iter": 1000, "weight_decay": 1e-06, "width": 512, "xyz_type": "all" } 2024-04-18 12:05:16,756 INFO Train. Iter 20 : Loss. 2.88080 2024-04-18 12:05:20,199 INFO Train. Iter 40 : Loss. 2.79312 2024-04-18 12:05:23,651 INFO Train. Iter 60 : Loss. 2.79212 2024-04-18 12:08:05,232 INFO Train. Iter 1000 : Loss. 2.09924 2024-04-18 12:10:58,100 INFO Train. Iter 2000 : Loss. 1.39754 2024-04-18 12:16:39,970 INFO Train. Iter 3980 : Loss. 1.06009 2024-04-18 12:16:43,554 INFO Train. Iter 4000 : Loss. 1.02272 2024-04-18 12:22:30,369 INFO Train. Iter 6000 : Loss. 0.94934 2024-04-18 12:28:35,686 INFO Train. Iter 8100 : Loss. 0.77994 2024-04-18 12:35:36,789 INFO Train. Iter 10380 : Loss. 0.64359 2024-04-18 12:35:40,204 INFO Train. Iter 10400 : Loss. 0.63709 2024-04-18 12:43:06,749 INFO Train. Iter 13000 : Loss. 0.47693 2024-04-18 12:45:58,573 INFO Train. Iter 14000 : Loss. 0.44967
I'm confused about the 2 results. Do you know the reason? Or can you finetuning with trans.pth for a little while with lr=1e-4.
Sorry to bother you again.
Can you try to train longer? It seems like you're using a batch size of 128 with 30,000 epochs.
I use batch size of 512 with 75,000 epochs. I also tried batch size 128 with more epochs, 300,000 epochs (to make total data the same). These 2 settings have similar results.
If you have any further questions, please don't hesitate to ask. I'll do my best to assist you.
Thank you so much. I will try again
Can you share the training log of t2m_trans?
I found it difficult to train t2m_trans.