Pixel2mesh training loss plateaus

simofoti commented 5 years ago

Hello,

First of all, thanks for your amazing work! I am currently trying to train your pixel2mesh implementation, but I am not able to obtain decent results and I was wondering if you changed something that is not tunable from its config file.

The code I am using doesn't have any modification and I downloaded your pre-processed data. After epoch 8 the loss plateaus around 30 and oscillates around that value at least until epoch 50 (then I stopped the training because didn't look like it was going to get better).

Here you can find the visualisation of a prediction and its GT at epoch 50.

000 000_gt

Thanks in advance for your time.

LMescheder commented 5 years ago

Hi @simofoti, thanks for reporting the issue - we will look into it! Maybe something was broken, when we prepared the public code release. For reference: when we trained the model we achieved a training loss of about 4. Sample visualization of prediction: 002

simofoti commented 5 years ago

Hi @LMescheder,

Thanks for your quick answer! Your results looks much better for sure :) Yesterday I launched another training with the learning rate and weight decay reported in their original paper. It reached epoch 40 with a loss again around 30, but this time seems to be still decreasing (really slowly). Hopefully it is just a matter of training for longer... I'll let you know if I have any success.

Thanks for giving me some reference.

m-niemeyer commented 5 years ago

Hi @simofoti, thanks for providing feedback! We cloned this github repo yesterday night and started a new training for a pixel2mesh model with the config file provided in configs/img/pixel2mesh.yaml. These results indicate that training progresses as usual. For reference, below you can see the loss and the visualization for your example after one night of traning (~50k Iterations, Epoch 19). Could it be that changed the pixel2mesh config or that something broke during the data downloading process?

Best of luck!

Training Loss: loss

Visualization: 006

simofoti commented 5 years ago

Hello @m-niemeyer ,

Thanks for looking into it. I was able to make it work with my data and I stopped experimenting with ShapeNet. Now I have some time and I am trying to re-run the training on ShapeNet cloning again the repo and using the pre-processed data. I should be able to get some results tomorrow. I will let you know if I obtain the something.

PS. Just to let you know... OccupancyNet worked properly

simofoti commented 5 years ago

Hello @m-niemeyer,

I have run the training over the whole weekend but I still don't get the same results as yours. These are the steps I followed:

I cloned your repo again
Since I already had the environment in place, I have just compiled the extension modules
I executed the download script and it looks like it downloaded and unpacked the data automatically into the data/ShapeNet folder.
I started the training with python train.py config/img/pixel2mesh.yaml

Here is what I get:

Training loss: train_loss

Validation losses: val_losses

Prediction: 009 GT: 009_gt

To answer "Could it be that changed the pixel2mesh config or that something broke during the data downloading process?"

After the training, I checked if something was different in the code (just to be sure that I didn't accidentally change something), but the repo is up-to-date with your origin/master branch (which seems to be the only one you released). This means that the pixel2mesh config file is unchanged.

The download script, as I said above, copied the data in the right folder. No errors were thrown by the script and for each object of each class in ShapeNet I have:

24 views,
the camera parameters (cameras.npz),
model.binvox,
pointcloud.npz,
points.npz.

Do you have any idea of what could possibly go wrong?

Thanks in advance.

m-niemeyer commented 5 years ago

Hi @simofoti ,

thanks for your detailed report. To check again that our provided code and data works, I followed the complete pipeline as indicated in the readme. That is, cloning the repo, creating the anaconda environment, building the extensions, and downloading the provided data. After one night of training, the loss and visualizations look the same as before (see below), hence it worked fine for me.

pix2mesh

As it works fine for me, it is quite hard to "debug" what is going wrong in your case. First, I would try to delete everything (including the anaconda environment and all cached packages) and follow the procedure again (you can also rename the environment to be sure). To do this, run conda env remove --name mesh_funcspace and conda clean --all. You can rename the environment in the environment.yaml file.

EDIT: For safety reasons, you can comment out the extensions for the DMC method and also comment out the import in im2mesh/config.py (also indicated in the readme).

Good luck!

LMescheder commented 4 years ago

Problem seems to be resolved. Closing for now.

LMescheder commented 4 years ago

Could be related: #31

YokkaBear commented 4 years ago

Hello @m-niemeyer,

I have run the training over the whole weekend but I still don't get the same results as yours. These are the steps I followed: ... The download script, as I said above, copied the data in the right folder. No errors were thrown by the script and for each object of each class in ShapeNet I have:

24 views,

the camera parameters (cameras.npz),

model.binvox,

pointcloud.npz,

points.npz.

Do you have any idea of what could possibly go wrong?

Thanks in advance.

Hello, @LMescheder , @simofoti and @m-niemeyer , I was trying to train pixel2mesh network out of the dataset constructed by my own. But after the data preprocessing according to this project, I found that for each object folder the camera parameters file (cameras.npz) does not exist. Could you please show me how to generate the camera parameters files (cameras.npz)? Thanks a lot

zyz-notebooks commented 4 years ago

Hello,

First of all, thanks for your amazing work! I am currently trying to train your pixel2mesh implementation, but I am not able to obtain decent results and I was wondering if you changed something that is not tunable from its config file.

The code I am using doesn't have any modification and I downloaded your pre-processed data. After epoch 8 the loss plateaus around 30 and oscillates around that value at least until epoch 50 (then I stopped the training because didn't look like it was going to get better).

Here you can find the visualisation of a prediction and its GT at epoch 50.

Thanks in advance for your time.

hi, when I trying to train pixel2mesh implementation, I have the following problems Visualizing Traceback (most recent call last): File "train.py", line 135, in trainer.visualize(data_vis) File "/home/zyz/Project/occupancy_networks/im2mesh/pix2mesh/training.py", line 287, in visualize points_out = common.transform_points_back(pred_vertices_3, world_mat) File "/home/zyz/Project/occupancy_networks/im2mesh/common.py", line 228, in transform_points_back points_out = points_out @ b_inv(R.transpose(1, 2)) File "/home/zyz/Project/occupancy_networks/im2mesh/common.py", line 209, in b_inv binv, = torch.gesv(eye, b_mat) AttributeError: module 'torch' has no attribute 'gesv'

you have encountered this problem? I am looking forward to your help

b7leung commented 3 years ago

I have the same issues as the OP. Below are the training graphs, and an example of a final reconstructed mesh. They essentially all look like "blobs" and don't resemble the class at all. I also tried training a model on the car class only, and saw similar results.

SNAG-0000 SNAG-0001

m-niemeyer commented 3 years ago

Hi, the problem might be related to this file ordering issue. If this was the case, it should be fixed now. Thanks!

autonomousvision / occupancy_networks

Pixel2mesh training loss plateaus #16