chungyiweng / humannerf

HumanNeRF turns a monocular video of moving people into a 360 free-viewpoint video.
MIT License
786 stars 86 forks source link

Produce empty images using adventure.yaml #48

Closed cshen11 closed 1 year ago

cshen11 commented 1 year ago

Hi @chungyiweng, thanks for the great work! I encountered an issue "Produce empty images; reload the init model" (similar to #43) when using adventure.yaml for training. This issue consistently happens on zju-mocap and custom datasets and only happens with adventure.yaml. The config with single_gpu.yaml had no problem.

I did several tests myself and found the possible reason could be the patch size:

adventure.yaml uses the default patch size 32x32. If the patch size is set to 24x24 or 20x20 in adventure (like in single_gpu), the issue does not occur. BTW there is no fatal error (e.g., out of memory) when running the code using 32x32 on my machine. It just produces empty images.

If the cause is truly the patch size, can you help me understand why the patch size is so critical during training such that it could lead to empty images if not set properly. Thank you!

QyDing02 commented 1 year ago

I encountered the same question, not on zju-mocap, but just only on my custom dataset.

reaper19991110 commented 1 year ago

I encountered the same question, not on zju-mocap, but just only on my custom dataset.

I also encountered the 'Produce empty images' issue when working with custom videos, but I was able to train using zju-mocap. Do you have any solutions to this problem?

louhz commented 1 year ago

This is because the intialization of the cananical output is too small for rbg and there are five network need to be added in optimization. In this case, the lr and loss weight need to be tuning to make the network converge in the first 5000 iteration.

louhz commented 1 year ago

Also the bgcolor is randomly generated and some time the bgcolor may be too large i guess?

reaper19991110 commented 1 year ago

此外,bgcolor 是随机生成的,我想有时 bgcolor 可能太大了?

Thank you very much for your answer.I've adjusted the learning rate and loss weights, but it doesn't seem to have effect. I removed the check for empty images in train.py, so the training can continue for a while, but I'm still getting empty images. So I'm wondering if there's some issue with one of the parameters in my metadata.json file.

louhz commented 1 year ago

So i am using the vscode and debug mode. I will set up a breakpoint in the raws2outputs return and check the rgb map and acc map if the value is too small here, you can manually tuning the lr and loss weight to see whether you can have an acceptable value. This is really hard to say the exact reason, maybe the bounding box is not correct? Or the ray mask masked out too many rays and there are not enough ray come into model.

reaper19991110 commented 1 year ago

So i am using the vscode and debug mode. I will set up a breakpoint in the raws2outputs return and check the rgb map and acc map if the value is too small here, you can manually tuning the lr and loss weight to see whether you can have an acceptable value. This is really hard to say the exact reason, maybe the bounding box is not correct? Or the ray mask masked out too many rays and there are not enough ray come into model.

According to your suggestion ,I check the value of the rgb map in the function raws2output during process images, and for zju-mocap, it always reaches a value bigger than 10^-5.

However, for custom data, the maximum value of rgb map is around 10^-11 no matter how I modify lr and loss weight.

And I also had the same problem of generating empty images when I tried to use the video from the author's project home page for training.

louhz commented 1 year ago

So i am using the vscode and debug mode. I will set up a breakpoint in the raws2outputs return and check the rgb map and acc map if the value is too small here, you can manually tuning the lr and loss weight to see whether you can have an acceptable value. This is really hard to say the exact reason, maybe the bounding box is not correct? Or the ray mask masked out too many rays and there are not enough ray come into model.

According to your suggestion ,I check the value of the rgb map in the function raws2output during process images, and for zju-mocap, it always reaches a value bigger than 10^-5.

However, for custom data, the maximum value of rgb map is around 10^-11 no matter how I modify lr and loss weight.

And I also had the same problem of generating empty images when I tried to use the video from the author's project home page for training.

This may raised by the incorrect pts_mask, which is the foreground probability shown in paper, you can double check that since the tri--linear interpolation may not be correctly work in custom dataset

reaper19991110 commented 1 year ago

I used the rembg library to split images to form masks, and they look OK.

I just looked at the paper and noticed that the author said, "We additionally resize video frames to keep the height of subject at approximately 500 pixels."

So there seems to be some size requirements for images with custom data.

reaper19991110 commented 1 year ago

所以我正在使用 vscode 和调试模式。我将在 raws2output 返回中设置一个断点,并检查 rgb map 和 acc map 如果这里的值太小,您可以手动调整 lr 和 loss weight,看看是否可以有一个可接受的值。这真的很难说确切的原因,也许边界框不正确?或者光线遮罩遮盖了太多光线,并且没有足够的光线进入模型。

根据您的建议,我在处理图像时检查函数raws2output中rgb映射的值,对于zju-mocap,它总是达到大于10^-5的值。 但是,对于自定义数据,无论我如何修改lr和减肥,rgb map的最大值都在10^-11左右。 当我尝试使用作者项目主页上的视频进行培训时,我也遇到了生成空图像的相同问题。

这可能是由不正确的pts_mask引起的,这是论文中显示的前景概率,您可以仔细检查,因为三线性插值可能无法在自定义数据集中正确工作 I think there's something wrong with metadata.json. I used zju_mocap387 to experiment. The experiment worked successfully when I replaced the cameras.pkl and mesh_infos.pkl generated by my own metadata.json with the correct cameras.pkl and mesh_infos.pkl.

louhz commented 1 year ago

所以我正在使用 vscode 和调试模式。我将在 raws2output 返回中设置一个断点,并检查 rgb map 和 acc map 如果这里的值太小,您可以手动调整 lr 和 loss weight,看看是否可以有一个可接受的值。这真的很难说确切的原因,也许边界框不正确?或者光线遮罩遮盖了太多光线,并且没有足够的光线进入模型。

根据您的建议,我在处理图像时检查函数raws2output中rgb映射的值,对于zju-mocap,它总是达到大于10^-5的值。 但是,对于自定义数据,无论我如何修改lr和减肥,rgb map的最大值都在10^-11左右。 当我尝试使用作者项目主页上的视频进行培训时,我也遇到了生成空图像的相同问题。

这可能是由不正确的pts_mask引起的,这是论文中显示的前景概率,您可以仔细检查,因为三线性插值可能无法在自定义数据集中正确工作 I think there's something wrong with metadata.json. I used zju_mocap387 to experiment. The experiment worked successfully when I replaced the cameras.pkl and mesh_infos.pkl generated by my own metadata.json with the correct cameras.pkl and mesh_infos.pkl.

I think this is possible, if the camera poses and the skeleton info mismatch, then the empty result is reasonable

Dipankar1997161 commented 1 year ago

@louhz @reaper19991110 I also got an error in the rendering when using metadata.json https://github.com/chungyiweng/humannerf/issues/73#issue-1750928287

Do you potentially know the issue here. If you can go through it once.

Thanks in advance.

reaper19991110 commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论)

您可能知道这里的问题吗?如果你能经历一次。

提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

Dipankar1997161 commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论) 您可能知道这里的问题吗?如果你能经历一次。 提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

I used the processed file for H36m that they gave under dataset.md, However, since no camera params were given, I am unable to solve it.

Actually, I also got empty images at first, but when I used np.eye(4) as extrinsic and intrinsic as fx,fy = 443.4 ( given as per FOV = 60 on ROMP )

I did not got any empty images, but the rendering was horrible.

louhz commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论) 您可能知道这里的问题吗?如果你能经历一次。 提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

I used the processed file for H36m that they gave under dataset.md, However, since no camera params were given, I am unable to solve it.

Actually, I also got empty images at first, but when I used np.eye(4) as extrinsic and intrinsic as fx,fy = 443.4 ( given as per FOV = 60 on ROMP )

I did not got any empty images, but the rendering was horrible.

So i think i find the reason for this, did you count how many validate rays in your custom dataset? In zju-mocap, the valid rays number is 15000 approximately, so the batch size is 32000 to make sure that all valid rays will be counted and also the pose -refiner will work as usual to optimze the entire human pose However, if you set the chunk size to much more smaller value like 6144, then the result is empty, and you need to set up your resize_img_scale to 0.125 to have a valid output.

In your case, it looks like the picture is 1080p? i guess and you can increase the chunk size or resize the image?

Dipankar1997161 commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论) 您可能知道这里的问题吗?如果你能经历一次。 提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

I used the processed file for H36m that they gave under dataset.md, However, since no camera params were given, I am unable to solve it. Actually, I also got empty images at first, but when I used np.eye(4) as extrinsic and intrinsic as fx,fy = 443.4 ( given as per FOV = 60 on ROMP ) I did not got any empty images, but the rendering was horrible.

So i think i find the reason for this, did you count how many validate rays in your custom dataset? In zju-mocap, the valid rays number is 15000 approximately, so the batch size is 32000 to make sure that all valid rays will be counted and also the pose -refiner will work as usual to optimze the entire human pose However, if you set the chunk size to much more smaller value like 6144, then the result is empty, and you need to set up your resize_img_scale to 0.125 to have a valid output.

In your case, it looks like the picture is 1080p? i guess and you can increase the chunk size or resize the image?

Thank you for the message. I resolved the issue. It was becoz of the camera values or the smpl values.

It generates empty images becoz the projections return 0

gushengbo commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论) 您可能知道这里的问题吗?如果你能经历一次。 提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

I used the processed file for H36m that they gave under dataset.md, However, since no camera params were given, I am unable to solve it. Actually, I also got empty images at first, but when I used np.eye(4) as extrinsic and intrinsic as fx,fy = 443.4 ( given as per FOV = 60 on ROMP ) I did not got any empty images, but the rendering was horrible.

So i think i find the reason for this, did you count how many validate rays in your custom dataset? In zju-mocap, the valid rays number is 15000 approximately, so the batch size is 32000 to make sure that all valid rays will be counted and also the pose -refiner will work as usual to optimze the entire human pose However, if you set the chunk size to much more smaller value like 6144, then the result is empty, and you need to set up your resize_img_scale to 0.125 to have a valid output. In your case, it looks like the picture is 1080p? i guess and you can increase the chunk size or resize the image?

Thank you for the message. I resolved the issue. It was becoz of the camera values or the smpl values.

It generates empty images becoz the projections return 0

How do you get the true camera values and smpl values? I use ROMP to get the smpl and use the camera like

image

I can render image, but the results are terrible.

Dipankar1997161 commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论) 您可能知道这里的问题吗?如果你能经历一次。 提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

I used the processed file for H36m that they gave under dataset.md, However, since no camera params were given, I am unable to solve it. Actually, I also got empty images at first, but when I used np.eye(4) as extrinsic and intrinsic as fx,fy = 443.4 ( given as per FOV = 60 on ROMP ) I did not got any empty images, but the rendering was horrible.

So i think i find the reason for this, did you count how many validate rays in your custom dataset? In zju-mocap, the valid rays number is 15000 approximately, so the batch size is 32000 to make sure that all valid rays will be counted and also the pose -refiner will work as usual to optimze the entire human pose However, if you set the chunk size to much more smaller value like 6144, then the result is empty, and you need to set up your resize_img_scale to 0.125 to have a valid output. In your case, it looks like the picture is 1080p? i guess and you can increase the chunk size or resize the image?

Thank you for the message. I resolved the issue. It was becoz of the camera values or the smpl values. It generates empty images becoz the projections return 0

How do you get the true camera values and smpl values? I use ROMP to get the smpl and use the camera like

image

I can render image, but the results are terrible.

If you used ROMP, how is that your intrinsic? They use a different value which is given in the config file.

Check that

gushengbo commented 1 year ago

Thank you! I just run "romp --mode=video --calc_smpl --render_mesh -i=/home/shengbo/data/kinect -o=/home/shengbo/data/param_kinect" in ROMP, so all configs are default?

# focal length: when FOV=50 deg, 548 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(25))
# focal length: when FOV=60 deg, 443.4 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(30))
# focal length: when FOV=72 deg, 352 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(36))

my images is (1920,1080), the default FOV=60 deg, so my focal_length= H/2 1/(tan(FOV/2)) = 1920/2. 1./np.tan(np.radians(30)),is that right?

thank you !

Dipankar1997161 commented 1 year ago

Thank you! I just run "romp --mode=video --calc_smpl --render_mesh -i=/home/shengbo/data/kinect -o=/home/shengbo/data/param_kinect" in ROMP, so all configs are default?

# focal length: when FOV=50 deg, 548 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(25))
# focal length: when FOV=60 deg, 443.4 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(30))
# focal length: when FOV=72 deg, 352 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(36))

my images is (1920,1080), the default FOV=60 deg, so my focal_length= H/2 1/(tan(FOV/2)) = 1920/2. 1./np.tan(np.radians(30)),is that right?

thank you !

You can use your focal length with that calculation and see if it renders. When I asked the person from Romp, he told me to use 443.4 as the fx and fy ( focal length ) since they use FOV = 60

gushengbo commented 1 year ago

Thank you! I just run "romp --mode=video --calc_smpl --render_mesh -i=/home/shengbo/data/kinect -o=/home/shengbo/data/param_kinect" in ROMP, so all configs are default?

# focal length: when FOV=50 deg, 548 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(25))
# focal length: when FOV=60 deg, 443.4 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(30))
# focal length: when FOV=72 deg, 352 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(36))

my images is (1920,1080), the default FOV=60 deg, so my focal_length= H/2 1/(tan(FOV/2)) = 1920/2. 1./np.tan(np.radians(30)),is that right? thank you !

You can use your focal length with that calculation and see if it renders. When I asked the person from Romp, he told me to use 443.4 as the fx and fy ( focal length ) since they use FOV = 60

Hello, I could render the images. However, I find that the free view rendering and T-pose rendering results are incorrect. Free view: image

T-pose: image

Dipankar1997161 commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论) 您可能知道这里的问题吗?如果你能经历一次。 提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

I used the processed file for H36m that they gave under dataset.md, However, since no camera params were given, I am unable to solve it. Actually, I also got empty images at first, but when I used np.eye(4) as extrinsic and intrinsic as fx,fy = 443.4 ( given as per FOV = 60 on ROMP ) I did not got any empty images, but the rendering was horrible.

So i think i find the reason for this, did you count how many validate rays in your custom dataset? In zju-mocap, the valid rays number is 15000 approximately, so the batch size is 32000 to make sure that all valid rays will be counted and also the pose -refiner will work as usual to optimze the entire human pose However, if you set the chunk size to much more smaller value like 6144, then the result is empty, and you need to set up your resize_img_scale to 0.125 to have a valid output.

In your case, it looks like the picture is 1080p? i guess and you can increase the chunk size or resize the image?

Hello @louhz, I am getting empty images for a new dataset which has the following image size

I was wondering if my image is of size 2448 x 2048, should I set the chunk value to lets say 2448 x 20 equals 48960 so approx 50k 20 - patch size that I am using as I am training on single gpu

So I resized the images to 1024, 1024 and used the default chunk now of 32000, I am still getting empty results, prog_000100-2

What could be the reason here now? Also you mentioned something regarding valid_rays for zju_mocap, how did you find that?

louhz commented 1 year ago

使用元数据时,我在渲染中也遇到了错误.json #73(评论) 您可能知道这里的问题吗?如果你能经历一次。 提前谢谢。

hi, I also don't know the reason for the problem, but thank you very much for your questions in ROMP to help me use ROMP to complete metadata.json. Logically, ROMP should calculate the correct SMPL parameters. But I still render an empty image.

I used the processed file for H36m that they gave under dataset.md, However, since no camera params were given, I am unable to solve it. Actually, I also got empty images at first, but when I used np.eye(4) as extrinsic and intrinsic as fx,fy = 443.4 ( given as per FOV = 60 on ROMP ) I did not got any empty images, but the rendering was horrible.

So i think i find the reason for this, did you count how many validate rays in your custom dataset? In zju-mocap, the valid rays number is 15000 approximately, so the batch size is 32000 to make sure that all valid rays will be counted and also the pose -refiner will work as usual to optimze the entire human pose However, if you set the chunk size to much more smaller value like 6144, then the result is empty, and you need to set up your resize_img_scale to 0.125 to have a valid output. In your case, it looks like the picture is 1080p? i guess and you can increase the chunk size or resize the image?

Hello @louhz, I am getting empty images for a new dataset which has the following image size

I was wondering if my image is of size 2448 x 2048, should I set the chunk value to lets say 2448 x 20 equals 48960 so approx 50k 20 - patch size that I am using as I am training on single gpu

So I resized the images to 1024, 1024 and used the default chunk now of 32000, I am still getting empty results, prog_000100-2

What could be the reason here now? Also you mentioned something regarding valid_rays for zju_mocap, how did you find that?

So, the number of validate rays can be find by counting the mask validate pixel number, ie, mask.nonzero()

The biggest reason about this setting is because in humannerf, we train the mlp (the inverse lbs) as a backward warping function and they assume in the single view video, the surface topology of reconstructed human only have positional difference and each frame share the volumetric consistency, thus we can train the mlp to map each ray on each frame back to smpl canonical model.

In humannerf, the sampling strategy is patch based sampling and all patch in same batch need to come from same image frame (unlike classic nerf, which is random sampling over the entire sequence) So you need all pixel that contains the human information to be sampled by each batch

Also, my personal suggestion is that you should check humanrf, which is sota work for avator reconstruction, they use occupancy grid to guide the ray sampling and implicitly manage human surface topology. They achieve great quality for both rigid and non rigid motion!