Open poof420 opened 1 year ago
Hello, here is another colab demo which can generate the SMPL file: https://colab.research.google.com/drive/1DGSHYtiWy8zDdyiQSgldq8VkxMIAu4Ql?usp=sharing
But it takes some time to install the environment. I will further work on the rendering part for colab demo.
Okay I will try. Thank you. Yes if possible it would be great to do it in the official one more easily.
Ah tried the colab demo you shared, but it seems that it gets stalled out in the training process.
CalledProcessError: Command 'b'source activate VQTrans\n\npython\nimport sys\nsys.argv = [\'GPT_eval_multi.py\']\nimport options.option_transformer as option_trans\nargs = option_trans.get_args_parser()\n\nargs.dataname = \'t2m\'\nargs.resume_pth = \'pretrained/VQVAE/net_last.pth\'\nargs.resume_trans = \'pretrained/VQTransformer_corruption05/net_best_fid.pth\'\nargs.down_t = 2\nargs.depth = 3\nargs.block_size = 51\nimport clip\nimport torch\nimport numpy as np\nimport models.vqvae as vqvae\nimport models.t2m_trans as trans\nimport warnings\nwarnings.filterwarnings(\'ignore\')\n\n## load clip model and datasets\nclip_model, clip_preprocess = clip.load("ViT-B/32", device=torch.device(\'cuda\'), jit=False, download_root=\'./\') # Must set jit=False for training\nclip_model.eval()\nfor p in clip_model.parameters():\n p.requires_grad = False\n\nnet = vqvae.HumanVQVAE(args, ## use args to define different parameters in different quantizers\n args.nb_code,\n args.code_dim,\n args.output_emb_width,\n args.down_t,\n args.stride_t,\n args.width,\n args.depth,\n args.dilation_growth_rate)\n\n\ntrans_encoder = trans.Text2Motion_Transformer(num_vq=args.nb_code, \n embed_dim=1024, \n clip_dim=args.clip_dim, \n block_size=args.block_size, \n num_layers=9, \n n_head=16, \n drop_out_rate=args.drop_out_rate, \n fc_rate=args.ff_rate)\n\n\nprint (\'loading checkpoint from {}\'.format(args.resume_pth))\nckpt = torch.load(args.resume_pth, map_location=\'cpu\')\nnet.load_state_dict(ckpt[\'net\'], strict=True)\nnet.eval()\nnet.cuda()\n\nprint (\'loading transformer checkpoint from {}\'.format(args.resume_trans))\nckpt = torch.load(args.resume_trans, map_location=\'cpu\')\ntrans_encoder.load_state_dict(ckpt[\'trans\'], strict=True)\ntrans_encoder.eval()\ntrans_encoder.cuda()\n\nmean = torch.from_numpy(np.load(\'./checkpoints/t2m/VQVAEV3_CB1024_CMT_H1024_NRES3/meta/mean.npy\')).cuda()\nstd = torch.from_numpy(np.load(\'./checkpoints/t2m/VQVAEV3_CB1024_CMT_H1024_NRES3/meta/std.npy\')).cuda()\n\n# change the text here\nclip_text = ["a person runs in a circle and flails their arms"]\n\n\ntext = clip.tokenize(clip_text, truncate=True).cuda()\nfeat_clip_text = clip_model.encode_text(text).float()\nindex_motion = trans_encoder.sample(feat_clip_text[0:1], False)\npred_pose = net.forward_decoder(index_motion)\n\nfrom utils.motion_process import recover_from_ric\npred_xyz = recover_from_ric((pred_pose*std+mean).float(), 22)\nxyz = pred_xyz.reshape(1, -1, 22, 3)\n\nnp.save(\'motion.npy\', xyz.detach().cpu().numpy())\n\nimport visualization.plot_3d_global as plot_3d\npose_vis = plot_3d.draw_to_batch(xyz.detach().cpu().numpy(),clip_text, [\'example.gif\'])\n'' returned non-zero exit status 1.
I've double checked the script, and it works for me. You can try it again or try the Hugging Face Space demo: https://huggingface.co/spaces/vumichien/generate_human_motion
You all had the same question. I was also looking at several novel ways to generate motion-based FBX files, from a Nat Lang approach, video recognition, etc. This methodology would be a huge time saver. I was looking at the co-lab, but yeah not super familiar with there. The huge space is a bit more friendly to understand, but the mesh file is not an exportable output.
Is it really possible to generate it? It doesn't seem to be working at all with the code provided. Thank you!