YNet: More details about the final training on ETH/UCY Dataset

HRHLALALA commented 2 years ago

Hi, can you provide more experimental details about the final training of Y-Net on ETH/UCY Dataset to get your scores on the paper? Like #29 , I cannot obtain the results on the paper as well. Specifically, I have the following questions:

Did you train the full epoch? I notice that you stopped training after the counter reaching 30. Otherwise, ~300 hours are required.
How did you apply the deformable convolution on the model?
Did you train directly on the segmentation masks or like the way on SDD datasets: pre-training a segmentation model and fine-tuning after 150 epochs?
I have run into a very severe overfitting during the training and the model simply ignore the scene information. Did you have the same problem? If yes, how did you handle it?
Which scene image did you use for uni_examples?

Or can you provide the pretrained weights and the pre-processed datasets?

Thanks!

HarshayuGirase commented 2 years ago

Hi,

I believe for ETH/UCY we trained for ~70-100 epochs with a LR of 0.0001.
We used the implementation from https://github.com/oeway/pytorch-deform-conv for Deformable Convolutions, but it seems that https://pytorch.org/vision/stable/_modules/torchvision/ops/deform_conv.html is recommended. To use deformable conv, we replaced all the conv operations with deformable conv.
In #29 , we released the ETH/UCY data we used -- we used segmentation masks which are provided as oracle.png in the ETH/UCY folders
I don't think the model overfitted and ignored the scene information, hopefully it will work with the released eth/ucy datasets
All oracle maps are released including uni_examples (which is basically all walkable with no obstacles)

Hope this helps!

HRHLALALA commented 2 years ago

Hi Harshayu, thank you very much for your response. It is really helpful! For the first question, may I confirm whether you have deleted the following code during the last training?

# train.py

# TODO Delete
if dataset_name == 'eth':
    print(counter)
    counter += batch_size
    # Break after certain number of batches to approximate evaluation, else one epoch takes really long
    if counter > 30: #TODO Delete
        break

HarshayuGirase commented 2 years ago

Yes, I believe this is deleted during final experiments

HRHLALALA commented 2 years ago

Hi,

I believe for ETH/UCY we trained for ~70-100 epochs with a LR of 0.0001.

We used the implementation from https://github.com/oeway/pytorch-deform-conv for Deformable Convolutions, but it seems that https://pytorch.org/vision/stable/_modules/torchvision/ops/deform_conv.html is recommended. To use deformable conv, we replaced all the conv operations with deformable conv.

In Question regarding Y-Net's ETH/UCY experiments #29 , we released the ETH/UCY data we used -- we used segmentation masks which are provided as oracle.png in the ETH/UCY folders

I don't think the model overfitted and ignored the scene information, hopefully it will work with the released eth/ucy datasets

All oracle maps are released including uni_examples (which is basically all walkable with no obstacles)

Hope this helps!

Thanks for your updated reply! I have replaced all Convs with deformable convolution but the training time is really long (more than one hours per epoch). Note that I train the model on RTX3090 using the batch size of 8. Is this same for you during the final training? Just want to confirm all the configurations are same as yours.

Here is my implementation of DeformConv2d using torchvision.ops.DeformConv2d

from torchvision.ops import DeformConv2d as __DeformConv2d
import torch.nn as nn
class DeformConv2d(__DeformConv2d):
    def __init__(self, *args, **kwargs):
        super(DeformConv2d, self).__init__(*args, **kwargs)
        self.offset_conv = nn.Conv2d(
            in_channels=self.in_channels,
            out_channels = 2 * self.kernel_size[0] * self.kernel_size[1],
            kernel_size= self.kernel_size,
            stride=self.stride,
            padding=self.padding,
            bias=self.bias is not None
        )

    def forward(self, x, mask=None):
        offset  = self.offset_conv(x)
        return super().forward(x, offset, mask)

HarshayuGirase commented 2 years ago

Hi @HRHLALALA,

We used a 16GB V100 for training. I was able to find a copy of a training log (not sure if this is the final model we used since it doesn't have deformable conv parameters, will try to double check on this) but hopefully this should help:

{'ade_loss_lambda': 1, 'batch_size': 16, 'centroid': 'unweighted', 'decoder_channels': [64, 64, 64, 32, 32], 'encoder_channels': [32, 32, 64, 64, 64], 'est_samples': 500, 'kernlen': 31, 'learning_rate': 0.0005, 'loss_scale': 1000, 'name': 'oracle_medium2', 'nsig': 16.0, 'num_epochs': 200, 'obs_len': 8, 'pred_len': 12, 'rel_thresh': 0.0001, 'resize': 0.5, 'scene': 'zara1', 'skip_samples': 0, 'softargmax': 1, 'temp': 0.5, 'total_len': 20, 'viz_epoch': 10}

NociTUM commented 1 year ago

Hi @HRHLALALA,

were you able to reproduce the results in the end? If so, would you mind sharing your notebook/ training file?

I did several training runs already with the whole dataset but cannot reproduce the ADE/FDE values reported on the ETH/UCY dataset.

Thanks in advance!

HRHLALALA commented 1 year ago

Unfortunately, the process is quite stochastic and time-consuming and I cannot reproduce the performance recorded in the paper with limited epochs. It may possible after training for a long time (e.g. few weeks) with good tuning hyperparameters. There is a reproduction report https://openreview.net/pdf?id=HV2zgpM7n0F.

NociTUM commented 1 year ago

Thank you very much for the insight, the report is quite helpful! One last question: Did you use your DeformConv2d implementation at the end to produce those numbers or did you stick to the original implementation? Along with that, do you still have the .pt-file with the network weights by any chance?

HRHLALALA commented 1 year ago

Thank you very much for the insight, the report is quite helpful! One last question: Did you use your DeformConv2d implementation at the end to produce those numbers or did you stick to the original implementation? Along with that, do you still have the .pt-file with the network weights by any chance?

Sorry for the late reply. Yes. But the DeformConv2d results in longer training. Unfortunately, none of my machines can reproduce ETH&UCY.

I have also tried to experiment with GoalSAR for some clues. Please see the issue https://github.com/luigifilippochiara/Goal-SAR/issues/2. This model actually uses the same structure as YNet except that the waypoints sampling is implemented using transformers and the data augmentations are more comprehensive. Hope this helps.

HarshayuGirase / Human-Path-Prediction

YNet: More details about the final training on ETH/UCY Dataset #35