Training on custom data

waleedrazakhan92 commented 1 year ago

Hello, can you share the details on how to train the model on custom data. Having gone through the paper, i believe we need: 1) Input images (custom images) 2) Masks (produced with face parsing, only for skin ) 3) Depth Masks (Same masks as above but without nose and mouth) 4) Depth Map (How to produce those) 5) Albedo Map (using SfSNet), but you mention in your paper that you convert it into grayscale first. Did i get that correctly? 6) Light directions. What are those and how to obtain them?

Is there anything else needed to train the model? If not then can you please direct me towards how to find the missing data, 4,5 and 6 to train on my custom images.

andrewhou1 commented 1 year ago

To produce the depth maps, you can use https://github.com/zqbai-jeremy/DFNRMVS

The depth masks correspond to any pixel that has a valid face depth.

Yes, the albedo is converted first to grayscale.

The lighting directions are also produced by SfSNet: I use the first order coefficients (2-4), normalize them, and treat that as the lighting direction.

waleedrazakhan92 commented 1 year ago

@andrewhou1 thankyou for a quick response. Also if i wan to change the resolution of the model from 256 to 512 or 1024 what changes I need to make, besides self.img_height and self.img_width, in the model to incorporate this change in size.

andrewhou1 commented 1 year ago

Yes that definitely needs to be changed. Also, change all instances of 256 to your new resolution. There's also this line:

sample_increments = torch.reshape(torch.tensor(np.arange(0.025, 0.825, 0.005)), (self.num_sample_points, 1, 1, 1))

You may want to increase self.num_sample_points given the larger resolution and adjust np.arange accordingly to match.

waleedrazakhan92 commented 1 year ago

@andrewhou1 I have changed the suggested input changes from 256 to 1024 wherever i can and the model takes 1024 input. I need advice and suggestions on a few more things to keep the performance high.

Can you explain what increasing self.num_sample_points does and more importantly how much should i increase it to?

Also since by changing the resolution the h4_out shape is also changed from [1,155,16,16] to [1,155,64,64] (for 1024 resolution). Should there need to be a change in the selection of indexes for identity_features and lighting_features https://github.com/andrewhou1/GeomConsistentFR/blob/5448302eab8d3ad01ea49897f734c65744c64e4a/test_relight_single_image.py#L198-L200 .

And also in https://github.com/andrewhou1/GeomConsistentFR/blob/5448302eab8d3ad01ea49897f734c65744c64e4a/test_relight_single_image.py#L203-L205 The average pooing size need to be also (16,16) to (64,64) which seems now to be quite a big window to be average pooling from. Do you suggest changing the average pooling size or the linear_SL input and output size for the model to keep its performance.

andrewhou1 commented 1 year ago

self.num_sample_points is the number of points that are sampled along each ray to determine if the original point on the face is under a cast shadow. If the points are sampled too sparsely, they may miss an occluding surface (such as the nose) and incorrectly determine a point to be well illuminated. This results in white stripes in the cast shadows, so self.num_sample_points should be set sufficiently high. If you want to maintain the same sampling frequency as I had in 256 resolution, increase the sampling rate 4x for 1024. Also change the np.arange portion to match this change. This is an experimental parameter: you can also lower the sampling rate as well and observe the effect on the performance, but you should not need to set it any higher than 4x its current setting.

For the other two tensors, I believe 64x64 should be fine. If the performance seems noticeably worse and you want to change to 32x32 or 16x16, you would need to add one or two more downsampling and upsampling blocks respectively.

waleedrazakhan92 commented 1 year ago

To produce the depth maps, you can use https://github.com/zqbai-jeremy/DFNRMVS

The depth masks correspond to any pixel that has a valid face depth.

Yes, the albedo is converted first to grayscale.

The lighting directions are also produced by SfSNet: I use the first order coefficients (2-4), normalize them, and treat that as the lighting direction.

Hi so I've been trying to get light directions from the SFS model. I couldn't get the matlab version to work but found a working pytorch version https://github.com/Mannix1994/SfSNet-Pytorch. I'm getting the outputs as expected, however i wanted to be clear about the light directions still. In the code there is an explanation about the light_out https://github.com/Mannix1994/SfSNet-Pytorch/blob/c2c1ed96b20dab66c5f84fe41ccb5d08aaa2291a/SfSNet_test.py#L66-L72 which i understand is the output determining the light direction. You mentioned that we need the first order coefficients and normalize them to get as light directions. Now in the code they are getting these 27 outputs (9 for each channel), which they reshape and form a 3 channel shade image. So how do i normalize this output to form the training lighting inputs like you have provided in the traning dataset.

andrewhou1 commented 1 year ago

So among those 27 outputs, you can reshape into a 9x3 matrix, where each column is the SH for each color channel. Then simply take the average to get a single 9x1 vector. You can use this to determine your lighting directions.

waleedrazakhan92 commented 1 year ago

@andrewhou1 but wouldn't that give me a 9x1 vector but in the training lightings .mat files, for each image there are just three values per image. So I'm unsure still about the exact process of how to get the format and the values like you've provided for training. Con you please share the same process or a piece of code that you used with which i can obtain the exact values in the exact format for the same image.

andrewhou1 commented 1 year ago

Right so then you can use the 2nd, 3rd, and 4th values and normalize them as a vector.

waleedrazakhan92 commented 1 year ago

@andrewhou1 can you tell how long(time in hours) did it take to train the final model?

andrewhou1 commented 1 year ago

At 256 resolution it took about 1 day to train. However at 1024 resolution, it would dramatically increase (maybe up to 4x) if you increase the sampling rate proportionally.

waleedrazakhan92 commented 1 year ago

@andrewhou1 also does the shape in both these tensors have anything to do with the batch size? Because when i try to change the batch size there is a shape mismatch error. torch.tensor([[[0.0]], [[0.0]], [[0.0]]]) and torch.reshape(tmp_incident_light_z, (3 1, 1, 1)))

https://github.com/andrewhou1/GeomConsistentFR/blob/5448302eab8d3ad01ea49897f734c65744c64e4a/train_raytracing_relighting_CelebAHQ_DSSIM_8x.py#L358-L359

andrewhou1 commented 1 year ago

yes it does. So if the batch size is n, then torch.tensor should have n of those 0.0s and torch.reshape(tmp_incident_light_z, (n, 1, 1, 1)) should be used.

waleedrazakhan92 commented 1 year ago

Thankyou, so if i replace these lines as : tmp_incident_light_z = torch.maximum(tmp_incident_light[:, 2], torch.zeros(self.batch_size,1,1).float().cuda())

incident_light = torch.cat((tmp_incident_light[:, 0:2], torch.reshape(tmp_incident_light_z, (self.batch_size, 1, 1, 1))), 1) This is the correct way right?

andrewhou1 commented 1 year ago

Right, that should be correct.

waleedrazakhan92 commented 1 year ago

1) One more question. You mention in that you upscaled the results from SfsNet from 128 to 256 resolution. Now as for the lighting directions, did you use the same values that were calculated on the 128 resolution and used them for 256 resolution to train your model?

If thats the case then if i just upscale the images again to 512 and use the same lighting direction values that you provided to train the model, would that be okay?

2) Also does self.batch_size need to be the same for both the training and testing? for example if i trained the model on batch size 2 then during the testing do i also have to set the self.batch_size to 2 as well?

andrewhou1 commented 1 year ago

Correct, the lighting directions are independent of resolution.

yafeim commented 1 year ago

Hello @andrewhou1 , I notice that SfSnet albedo images do not align well with the original image. How do you solve this problem? Thanks.

andrewhou1 commented 1 year ago

Thanks for your interest in our work!

Did you crop the original images first using our provided cropping code? They should align if the images are cropped.

yafeim commented 1 year ago

0_orig

Thanks for your quick reply. The first attached image is what I got from the cropping code, and the second is the provided albedo in MP_data. They are still misaligned. Can you advise? Thanks.

yafeim commented 1 year ago

Also, I got "assert img.shape[0] == img.shape[1] == 256 AssertionError" when applying the cropping logic to image "10587.jpg". Is the image discarded?

andrewhou1 commented 1 year ago

Hmmm that's interesting.....I inspected that training image on my end and the image matches the grayscale albedo (with the chin reflected). Did you install the separate dependencies for the cropping code? They're different from the dependencies for running the relighting model. If you did, you can try changing borderType=cv2.BORDER_DEFAULT to cv2.BORDER_REFLECT

andrewhou1 commented 1 year ago

Also yes, 10587.jpg was discarded

yafeim commented 1 year ago

Oh I see. I think the problem is that I installed a different version of opencv using pip. Now I am getting aligned crops. Thanks a lot.

andrewhou1 / GeomConsistentFR

Training on custom data #12