apple / ml-hugs

Official repository of HUGS: Human Gaussian Splats (CVPR 2024)
https://machinelearning.apple.com/research/hugs
Other
225 stars 26 forks source link

Question about dataset with strong human prior #21

Open LarkLeeOnePiece opened 2 weeks ago

LarkLeeOnePiece commented 2 weeks ago

Hello,thanks for this fantastic work. I found something that I would like to discuss with you. I found that this framework can't work when I test the dataset of one human in the indoor room with less translation,which means the human is always in the center of the image, just like for those video we take the video setting the camera rotating the human. I think that's because for these kind of data even though we use masks to ignore the human features but we still have strong human points cloud. And these points cloud would tends to be optimized first when we using Gaussian splatting which leads to small and weak gradients flow to the smpl points. Do you have any ideas for this problem?

MikeAiJF commented 2 weeks ago

image 你好,请问是遇到这个问题吗

LarkLeeOnePiece commented 2 weeks ago

应该不一样,我的是这个样子 014990 优化过程中,人体的静态点云得到了优化,但是在标准空间的SMPL人体模型的点云特征没有优化,你这个是什么问题导致的哇

MikeAiJF commented 2 weeks ago

应该不一样,我的是这个样子 014990 优化过程中,人体的静态点云得到了优化,但是在标准空间的SMPL人体模型的点云特征没有优化,你这个是什么问题导致的哇

我也不太清楚什么导致的可能是我的预处理方法不太好,导致相机参数输出的点云位置不太正确,从而出现这个效果。请问你的数据拍摄的时候镜头是移动的还是固定的?

LarkLeeOnePiece commented 2 weeks ago

移动镜头的

LarkLeeOnePiece commented 2 weeks ago

你可以先测试一下单独重建人体看看人体重建的效果

MikeAiJF commented 2 weeks ago

你可以先测试一下单独重建人体看看人体重建的效果 稀疏重建出来的文件夹有0和1两个文件夹,我就知道效果不好了

MikeAiJF commented 2 weeks ago

你可以先测试一下单独重建人体看看人体重建的效果

你有更好的预处理方法吗

LarkLeeOnePiece commented 2 weeks ago

我的预处理和Neuman的一样,colmap的重建相机和点云应该还是比较好的吧?

MikeAiJF commented 2 weeks ago

我的重建和Neuman的一样,colmap的重建相机和点云应该还是比较好的吧?

你neuman处理的数据无法进行hugs训练吧,因为缺少4d-humans文件夹

LarkLeeOnePiece commented 2 weeks ago

4d-humans文件夹里面其实只有那个smpl_optimized_aligned_scale重要,这里面就是smpl的参数+对齐的参数,这些对齐的参数都可以在neuman的预处理得到,4d-human这个文件夹只是用了一个4d-human在提取smpl的参数而不是romp,但是使用哪一个方法提取smpl的参数不影响这个框架的运行。

MikeAiJF commented 2 weeks ago

4d-humans文件夹里面其实只有那个smpl_optimized_aligned_scale重要,这里面就是smpl的参数+对齐的参数,这些对齐的参数都可以在neuman的预处理得到,4d-human这个文件夹只是用了一个4d-human在提取smpl的参数而不是romp,但是使用哪一个方法提取smpl的参数不影响这个框架的运行。

你这个对齐方法是你自己写的吗,能借我参考参考吗。还有你试过作者的将多个人物放入一个场景中的吗?

LarkLeeOnePiece commented 2 weeks ago

对齐的算法是用neuman的 alignment的矩阵的,我没有写脚本。大概的思路就是HUGS处理的人体都是在cannonical space里面的,然后我可以从neuman里面得到对齐参数(包括旋转,平移,尺缩),我把这个对齐参数应用到canonical space的SMPL模型就可以了。 我还没试过把模型都放在一个场景里面。

MikeAiJF commented 2 weeks ago

对齐的算法是用neuman的 alignment的矩阵的,我没有写脚本。大概的思路就是HUGS处理的人体都是在cannonical space里面的,然后我可以从neuman里面得到对齐参数(包括旋转,平移,尺缩),我把这个对齐参数应用到canonical space的SMPL模型就可以了。 我还没试过把模型都放在一个场景里面。

兄弟,问你一个问题 没有找到有效的 patch!返回空的 patches。 Traceback (most recent call last): File "main.py", line 108, in main(cfg) File "main.py", line 69, in main trainer.train() File "/project_02/ml-hugs/hugs/trainer/gs_trainer.py", line 277, in train loss, loss_dict, loss_extras = self.loss_fn( File "/home/kmks-server-02/miniconda3/envs/hugs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/project02/ml-hugs/hugs/losses/loss.py", line 117, in forward , pred_patches, gt_patches = self.patch_sampler.sample(mask, pred_img, gt_image) ValueError: not enough values to unpack (expected 3, got 2)这是什么原因

LarkLeeOnePiece commented 2 weeks ago

我没遇到过这个问题,但是你可以先尝试检查的输入的数据的维度对不对吧?

MikeAiJF commented 2 weeks ago

我前面处理的数据也没有遇到这个问题,今天处理的就遇到了,主要是同样的预处理方法。我不明白怎么会出错。

MikeAiJF commented 2 weeks ago

我没有遇到这个问题,但是你可以先尝试检查输入的数据维度对选择吧?

ength of xs: 38087 Contents of xs: [ 0 0 0 ... 416 416 416] Length of xs: 38087 Contents of xs: [ 0 0 0 ... 416 416 416] Length of xs: 34914 Contents of xs: [ 0 0 0 ... 445 445 445] Length of xs: 34914 Contents of xs: [ 0 0 0 ... 445 445 445] Length of xs: 38493 Contents of xs: [ 0 0 0 ... 406 406 406] Length of xs: 35472 Contents of xs: [ 0 0 0 ... 409 409 409] Length of xs: 35472 Contents of xs: [ 0 0 0 ... 409 409 409] Length of xs: 36244 Contents of xs: [ 0 0 0 ... 401 401 401] Length of xs: 36244 Contents of xs: [ 0 0 0 ... 401 401 401] Length of xs: 36257 Contents of xs: [ 0 0 0 ... 400 400 400] Length of xs: 36257 Contents of xs: [ 0 0 0 ... 400 400 400] Length of xs: 35955 Contents of xs: [ 0 0 0 ... 400 400 400] Length of xs: 35955 Contents of xs: [ 0 0 0 ... 400 400 400] Length of xs: 36046 Contents of xs: [ 0 0 0 ... 398 398 398] Length of xs: 36046 Contents of xs: [ 0 0 0 ... 398 398 398] Length of xs: 36786 Contents of xs: [ 0 0 0 ... 416 416 416] Length of xs: 36786 Contents of xs: [ 0 0 0 ... 416 416 416] Length of xs: 38979 Contents of xs: [ 0 0 0 ... 420 420 420] Training: 0%| | 20/14999 [00:03<46:14, 5.40it/s, #hp=110.2K, #sp=1.5K, h_sh_d=0, s_sh_d=0, l_l1=0.1717, l_ssim=0.0393, l_lpips_patch=0.2295, l_l1_human=1.0359, l_ssim_human=0.0034, l_lpips_pLength of xs: 35695 Contents of xs: [ 0 0 0 ... 411 411 411] Length of xs: 35695 Contents of xs: [ 0 0 0 ... 411 411 411] Length of xs: 34212 Contents of xs: [ 0 0 0 ... 402 402 402] Length of xs: 34212 Contents of xs: [ 0 0 0 ... 402 402 402] Length of xs: 33565 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 33565 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 37760 Contents of xs: [ 0 0 0 ... 407 407 407] Length of xs: 37760 Contents of xs: [ 0 0 0 ... 407 407 407] Length of xs: 28497 Contents of xs: [ 0 0 0 ... 411 411 411] Length of xs: 28497 Contents of xs: [ 0 0 0 ... 411 411 411] Length of xs: 35511 Contents of xs: [ 0 0 0 ... 409 409 409] Length of xs: 35511 Contents of xs: [ 0 0 0 ... 409 409 409] Length of xs: 39025 Contents of xs: [ 0 0 0 ... 422 422 422] Length of xs: 39025 Contents of xs: [ 0 0 0 ... 422 422 422] Length of xs: 36234 Contents of xs: [ 0 0 0 ... 400 400 400] Length of xs: 36234 Contents of xs: [ 0 0 0 ... 400 400 400] Length of xs: 23098 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 23098 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 37183 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 37183 Contents of xs: [ 0 0 0 ... 415 415 415] Training: 0%| | 30/14999 [00:03<33:33, 7.43it/s, #hp=110.2K, #sp=1.5K, h_sh_d=0, s_sh_d=0, l_l1=0.1222, l_ssim=0.0317, l_lpips_patch=0.4351, l_l1_human=0.6097, l_ssim_human=0.0026, l_lpips_pLength of xs: 25164 Contents of xs: [ 0 0 0 ... 416 416 416] Length of xs: 25164 Contents of xs: [ 0 0 0 ... 416 416 416] Length of xs: 35976 Contents of xs: [ 0 0 0 ... 408 408 408] Length of xs: 35976 Contents of xs: [ 0 0 0 ... 408 408 408] Length of xs: 37860 Contents of xs: [ 0 0 0 ... 403 403 403] Length of xs: 37860 Contents of xs: [ 0 0 0 ... 403 403 403] Length of xs: 35577 Contents of xs: [ 0 0 0 ... 409 409 409] Length of xs: 35577 Contents of xs: [ 0 0 0 ... 409 409 409] Length of xs: 38741 Contents of xs: [ 0 0 0 ... 411 411 411] Length of xs: 38741 Contents of xs: [ 0 0 0 ... 411 411 411] Length of xs: 36495 Contents of xs: [ 0 0 0 ... 406 406 406] Length of xs: 36495 Contents of xs: [ 0 0 0 ... 406 406 406] Length of xs: 38955 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 38955 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 39086 Contents of xs: [ 0 0 0 ... 414 414 414] Length of xs: 39086 Contents of xs: [ 0 0 0 ... 414 414 414] Length of xs: 36698 Contents of xs: [ 0 0 0 ... 404 404 404] Length of xs: 36698 Contents of xs: [ 0 0 0 ... 404 404 404] Length of xs: 38930 Contents of xs: [ 0 0 0 ... 410 410 410] Length of xs: 38930 Contents of xs: [ 0 0 0 ... 410 410 410] Training: 0%| | 40/14999 [00:04<27:41, 9.00it/s, #hp=110.2K, #sp=1.5K, h_sh_d=0, s_sh_d=0, l_l1=0.1304, l_ssim=0.0325, l_lpips_patch=0.3801, l_l1_human=0.7718, l_ssim_human=0.0031, l_lpips_pLength of xs: 32460 Contents of xs: [ 0 0 0 ... 397 397 397] Length of xs: 35676 Contents of xs: [ 0 0 0 ... 403 403 403] Length of xs: 25408 Contents of xs: [ 0 0 0 ... 407 407 407] Length of xs: 25408 Contents of xs: [ 0 0 0 ... 407 407 407] Length of xs: 37373 Contents of xs: [ 0 0 0 ... 408 408 408] Length of xs: 37373 Contents of xs: [ 0 0 0 ... 408 408 408] Length of xs: 44906 Contents of xs: [ 0 0 0 ... 465 465 465] Length of xs: 44906 Contents of xs: [ 0 0 0 ... 465 465 465] Length of xs: 35458 Contents of xs: [ 0 0 0 ... 403 403 403] Length of xs: 35458 Contents of xs: [ 0 0 0 ... 403 403 403] Length of xs: 33983 Contents of xs: [ 0 0 0 ... 455 455 455] Length of xs: 35638 Contents of xs: [ 0 0 0 ... 406 406 406] Length of xs: 35638 Contents of xs: [ 0 0 0 ... 406 406 406] Length of xs: 41987 Contents of xs: [ 0 0 0 ... 421 421 421] Length of xs: 41987 Contents of xs: [ 0 0 0 ... 421 421 421] Length of xs: 38416 Contents of xs: [ 0 0 0 ... 415 415 415] Length of xs: 38416 Contents of xs: [ 0 0 0 ... 415 415 415] Training: 0%| | 50/14999 [00:05<24:23, 10.22it/s, #hp=110.2K, #sp=1.5K, h_sh_d=0, s_sh_d=0, l_l1=0.0873, l_ssim=0.0301, l_lpips_patch=0.4170, l_l1_human=0.9265, l_ssim_human=0.0033, l_lpips_pLength of xs: 40188 Contents of xs: [ 0 0 0 ... 420 420 420] Length of xs: 40188 Contents of xs: [ 0 0 0 ... 420 420 420] Length of xs: 0 Contents of xs: [] 前面都有数据但是到最后突然变0了,不知道为什么

LarkLeeOnePiece commented 2 weeks ago

我也不确定是什么问题,你先把这个patch loss关了试试其他pipeline有没有问题先吧

MikeAiJF commented 1 week ago

我也不确定是什么问题,你先把这个patch loss关了试试其他pipeline有没有问题先吧 我刚看了尺寸,好像发现我的图片尺寸都不一样,是不是这个问题,我以前的图片和现在作者图片尺寸都是一样大小的。

LarkLeeOnePiece commented 1 week ago

我处理我自己的数据集都作者的数据集尺寸也不一样。 另外问你个问题兄弟,就是你用COLMAP的时候又遇到过就算用了mask但是人体部分的点云还是剔除不了嘛我现在遇到的问题就是,当我用COLMAP重建的时候,我发现就算我用了mask,但是人体部分的点云还是存在。我的视频基本就是围绕着一个人来拍摄了,你有遇到这个问题嘛,或者你有办法解决这个问题嘛?

MikeAiJF commented 1 week ago

我处理我自己的数据集都作者的数据集尺寸也不一样。 另外问你个问题兄弟,就是你用COLMAP的时候又遇到过就算用了mask但是人体部分的点云还是剔除不了嘛我现在遇到的问题就是,当我用COLMAP重建的时候,我发现就算我用了mask,但是人体部分的点云还是存在。我的视频基本就是围绕着一个人来拍摄了,你有遇到这个问题嘛,或者你有办法解决这个问题嘛? 你的意思是只重建出场景而不是人和场景吗?我想问一下大哥,那个ml-hugs预处理方法能教教我吗,我现在的预处理方法处理点问题跑不了了。

LarkLeeOnePiece commented 1 week ago

我们可以交流一下,但是我最近在忙着赶DDL。

MikeAiJF commented 1 week ago

我们可以交流一下,但是我最近在忙着赶DDL。

嗯,好的,DDL是?

LarkLeeOnePiece commented 1 week ago

就忙着赶论文ddl

MikeAiJF commented 1 week ago

就忙着赶论文ddl

好吧,我现在改我的预处理方法都改崩溃了

ZCWzy commented 1 day ago

哦牛逼,我也遇到这个问题了,竟然是这样吗 我的问题是,我用的仅人类模式就是smpl优化的很差,然后场景和人联合优化,就会出现你说的“人体的静态点云得到了优化,但是在标准空间的SMPL人体模型的点云特征没有优化”的问题。