alievk / npbg

Neural Point-Based Graphics
MIT License
327 stars 52 forks source link

How to fit scenes from other datasets #7

Closed ttsesm closed 4 years ago

ttsesm commented 4 years ago

Hi guys,

I would be interested to know whether it is possible to fit different scenes based on other existing datasets. For example I would like to understand and get an idea how to fit the redwood dataset (http://redwood-data.org/). I have read the README regarding how to fit our own scenes but it is not clear to me how to do the same with the e.g. the redwood data.

Thus, would be easy to give some feedback here how to do that.

Thanks.

seva100 commented 4 years ago

We have not tried datasets available at your link, but it seems like Redwood comes with everything you need: RGB images (you can ignore depth), .ply reconstructions, and camera poses. It seems that .log trajectory files are the view matrices already converted to the format required for Neural Point-Based Graphics (sometimes it's required to invert some axes or some matrices; in this case, you can try writing to the dataset authors about the exact format of their camera poses). Next, you'll need to make a paths file and a scene config -- there is a tutorial in the readme and some comments in this issue.

If you succeed in running this data, we would love to hear your feedback here - this would be really helpful for others!

ttsesm commented 4 years ago

@seva100 does the .ply file needs to have color information or not necessarily? Because the one provided from the redwood is colorless.

In principle if I need color I could recreate it with the color images and the agisoft/colmap software.

seva100 commented 4 years ago

No, .ply file does not need to have color information. It only needs to contain XYZ coordinates of the points.

ttsesm commented 4 years ago

Hi @seva100, I've managed to train the network with the redwood dataset but the output seems not correct. I've tried two approaches since with the default parameters I was getting a CUDA out of memory RuntimeError. So what I've tried are the following training modes, in the first one I've reduced the image size and in the second the batch size:

python train.py --config configs/train_example.yaml --pipeline npbg.pipelines.ogl.TexturePipeline --dataset_names redwood_bedroom --crop_size 256x256

python train.py --config configs/train_example.yaml --pipeline npbg.pipelines.ogl.TexturePipeline --dataset_names redwood_bedroom --batch_size 4 --batch_size_val 4

for the first case I've got a VAL LOSS 926.9789898726192 and for the second 872.1955623209042 which from what I understand it is high, could you please confirm whether these values are considered bad or good.

While the viewer output of course does not seem that good as well:

image

image

While I should be getting something similar to that:

image

Any idea what I might have doing wrong.

seva100 commented 4 years ago

Hi @ttsesm, I think most likely you'll need to play with the camera poses, as they are often provided in different ways. The format we expect is the same as the one given by Agisoft; you can find a good reference here (the only exception is that we invert the 2nd and 3rd columns of R matrix afterwards). In short, [R t | 0 1] should be a world2camera transformation matrix, so if for Redwood it is camera2world, you'll need to invert this matrix. Also, check that your intrinsic matrix is correct (it should be of a form [f 0 cx | 0 f cy | 0 0 1], where f is converted to pixels, and (cx, cy) should be close to an image center point).

About the high values of VGG loss: I think this depends on the dataset; we had numbers of 700-800 in case of a good convergence too. Also, I would try to decrease batch size to 1 for both training and validation (this should not significantly affect the quality) and set 512x512 crop size.

ttsesm commented 4 years ago

I see, yup it seems that you are right the redwood provides a camera2world transformation according to the trajectory description here. I will try to invert them and try again.

My cam intrinsic shouldn't be a problem since they are correctly defined as you can see below:

525.0000 0.000000 319.5000 0.000000
0.000000 525.0000 239.5000 0.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.000000 0.000000 1.000000
seva100 commented 4 years ago

Yes, please also try to change the sign of the 2nd and 3rd column. In the end, the matrices should correspond to the OpenGL coordinate system which has +X axis headed right, -Y up, -Z forward.

Perhaps the following trick can help to validate that your matrices are correct. Once you make some transformation of the view matrices and train something (at least for 1 epoch), you can open the viewer.py and provide an index of any view as --init-view 1 as a command-line argument to the viewer (you can use any other view index instead of "1"). Then, when the window appears, press X. This will switch the viewer from showing the network output to displaying the XYZ coordinates of points (visualized as RGB, R=X, G=Y, B=Z). If your view matrices are correct, you will see the points where they should be when looking from the camera #1 (or a different camera if you selected another index in --init-view).

ttsesm commented 4 years ago

I've tried to use the --init-view 1 option as you suggested but I am getting the following error:

$ python viewer.py --config downloads/redwood_bedroom.yaml --checkpoint data/logs/08-04_14-35-04___batch_size^4__dataset_names^redwood_bedroom/checkpoints/PointTexture_stage_0_epoch_2_redwood_bedroom.pth --init-view 1

loading pointcloud...
=== 3D model ===
VERTICES:  4893993
EXTENT:  [-2.4639  -0.99316 -4.4403 ] [4.3096 5.4167 3.0381]
================
new viewport size  (640, 480)
Traceback (most recent call last):
  File "viewer.py", line 434, in <module>
    my_app = MyApp(args)
  File "viewer.py", line 157, in __init__
    self.trackball = Trackball(init_view, self.viewport_size, 1, rotation_mode=args.rmode)
UnboundLocalError: local variable 'init_view' referenced before assignment
ttsesm commented 4 years ago

Hhmm, inverting the pose matrices and then changing the sign of the 2nd and 3rd column did not work as well. I should be doing something wrong somewhere else or something.

Is there a way to apply some debugging on the fly or at least to check that what I am using for the the training is correct.

seva100 commented 4 years ago

Ok, looks like the --init_view does not work as intended for now. Please try to use the following script: https://gist.github.com/seva100/4fe57ab17ebd943fa7614cb0d4d7f982 This script allows you to render either XYZ point coordinates visualized in RGB or network outputs. It should work out-of-the-box by executing: python generate_dataset.py --config downloads/redwood_bedroom.yaml --inputs xyz_p1 and produce .png renderings of XYZ point coordinates in the folder rendered of the project root. In this mode, this script does not require any trained network, so no weights checkpoints need to be specified in the config.

For example, when executed with the downloads/livingroom.yaml config, the script produces the following image for the camera #1:

portfolio_view

and we can see that the points project to the same places as in the respective ground truth image. This should be a working debugging procedure for the view matrices.

ttsesm commented 4 years ago

@seva100 with the generate_dataset.py script I was able to identify what was the issue and fix the poses (I just needed to change the sign in the 2nd and 3rd columns, there was no need for inverting the pose matrices). So I am also getting images similar to the one you posted as you can see below:

redwood_bedroom_000005

and my loss dropped as well to 491.04269130734633 which I guess is also a good sign.

I tried to visualize the output with the viewer:

python viewer.py --config downloads/redwood_bedroom.yaml --checkpoint data/logs/08-10_18-48-08___batch_size^4__dataset_names^redwood_bedroom/checkpoints/PointTexture_stage_0_epoch_20_redwood_bedroom.pth --viewport 2000,1328

and I am getting the following output:

npbg_output5

what I've noticed is that there are a lot of black areas in the view, is this fine considering that my rgb images are capturing quite well the whole scene?

seva100 commented 4 years ago

@ttsesm great to hear that the problem with view matrices was resolved; there is always a hassle with the poses coming from various external sources. Can you please show what your downloads/bedroom.yaml file looks like? I suppose that the path to net_ckpt was not provided in this file, but can't be sure.

ttsesm commented 4 years ago

Hi Artem, indeed I had the net_ckpt and texture_ckpt commented, since at that point I had copied from the livingroom.yaml and there were pointing to wrong paths. I've uncommented them specifying where to find the PointTexture_stage_0_epoch_39_redwood_bedroom.pth and UNet_stage_0_epoch_39_net.pth respectively but the output is more or less the same as above.

seva100 commented 4 years ago

@ttsesm hard to say at this point what could be the reason. Can you please post here:

I also have a hypothesis that some regions are missing in the .ply file you use. This might actually cause these black regions.

ttsesm commented 4 years ago

For training I used the following command: python train.py --config configs/train_example.yaml --pipeline npbg.pipelines.ogl.TexturePipeline --dataset_names redwood_bedroom --crop_size 512x512 --batch_size 4 --batch_size_val 4

while for viewing: python viewer.py --config downloads/redwood_bedroom.yaml --checkpoint data/logs/08-10_18-48-08___batch_size^4__dataset_names^redwood_bedroom/checkpoints/PointTexture_stage_0_epoch_39_redwood_bedroom.pth --viewport 2000,1328

If you want to train the scene by yourself you can download all the configs and data that I've used from the following link (I have set the folders to be in the same structure as you are using). You will need to comment though the if/elif statement in line https://github.com/alievk/npbg/blob/c0cf6f2224bda3d4a5007e05c05fbaf5b34cf256/npbg/gl/utils.py#L438 and replace it with just model['normals'] = data.vertex_normals since apparently the provided .ply does not contain nx, ny and nz variables and apply a small change in line https://github.com/alievk/npbg/blob/c0cf6f2224bda3d4a5007e05c05fbaf5b34cf256/npbg/datasets/dynamic.py#L384 with target_list = [os.path.join(config['target_path'], target_name_func(str(i).zfill(6))) for i in camera_labels] since the image names are provided with prefixed zeros.

seva100 commented 4 years ago

Thank you for providing the data, it seems that you do everything correctly, point cloud stored in the bedroom.ply looks quite OK, and images truly cover everything. So these black regions should not be present in the rendered output.

I tried training just you like did and can confirm the presence of the black regions, so what I noticed is that ground truth has some black parts -- see the following random screenshot from TensorBoard (top: rendered by NPBG; bottom: ground truth) image

Most likely, this happens because the ground truth image is 640x480, crop size is 512x512, and we also use random zoom augmentations in a range of [0.5, 2.0] zoom-ins. So in an extreme case of 0.5 zoom-out, the ground truth image gets resized to 320x240, which is less than the crop size of 512x512, which results in this black padding. Since we were always training NPBG on larger images (1080p and larger), this explains why we have not encountered this before.

Can you please try to train it with --crop_size 256x256? In this case, altered ground truth image fits into the crop, and there should be no black padding (a little more epochs might be needed though). Another way is to retain 512x512 crop size but change the line 24 random_zoom: [0.5, 2.0] to e.g. random_zoom: [1.0, 2.0] in the train_example.yaml config. As soon as I have a chance, I will try to do the same and report the result.

seva100 commented 4 years ago

@ttsesm, I've trained the network with --crop_size 256x256, and now the black regions are gone: redwood_bedroom_npbg_crop_256_short1

I've trained for 39 epochs, and with such crop size more epochs might be needed to achieve better results. You can also try the 2nd way I suggested (512x512 crop size and less aggressive zoom augmentations). Though, the results already seem quite acceptable to me.

ttsesm commented 4 years ago

Hi Artem, thank you a lot for the feedback. Indeed the --crop_size 256x256 seems to work nicely. I will also try your other suggestion and report back here.

ttsesm commented 4 years ago

@seva100 I can confirm that both approaches, i.e. --crop_size 256x256 and changing to less aggressive zoom augmentations solve the issue with the black spots.

Thanks for the help, hopefully these info will be helpful for others as well.

seva100 commented 4 years ago

@ttsesm Great to hear! By the way, if the format of your target image names is different from the default one, you can avoid changing this line https://github.com/alievk/npbg/blob/c0cf6f2224bda3d4a5007e05c05fbaf5b34cf256/npbg/datasets/dynamic.py#L384 and just change target_name_func in the paths config. For example, I used the following paths_example.yaml for the Redwood bedroom:

datasets:
    "redwood_bedroom":
        scene_path: data/redwood_bedroom.yaml
        target_path: data/scenes/redwood_bedroom/images/
        target_name_func: "lambda i: f'{int(i):06}.jpg'"