Training Loss Increasing After Initial Decrease with Custom Video Dataset

DaBihy commented 5 months ago

Hello Everyone,

I've been working with the V-JEPA model for a self-supervised learning project using a custom video dataset. Initially, the training loss decreases as expected, but starts to increase significantly after reaching a minimum. This behavior persists across multiple training sessions with different hyperparameters.

jepa_loss_small_collapse

Configuration:

Data Setup

Dataset Type: VideoDataset
Batch Size: 24
Number of Clips: 1
Number of Frames per Clip: 16
Tubelet Size: 4
Sampling Rate: 2
Crop Size: 224
Patch Size: 16
Memory Pinning: true
Number of Workers: 8
Filter Short Videos: false

Data Augmentation

Auto Augment: false
Motion Shift: false
Random Resize Aspect Ratio: [0.75, 1.35]
Random Resize Scale: [0.3, 1.0]
Re-probability: 0.0

Loss Configuration

Loss Exponent: 1.0
Regularization Coefficient: 0.0

Mask Settings

Aspect Ratio: [0.75, 1.5]
Number of Blocks: [8, 2]
Spatial Scale: [0.15, 0.7]
Temporal Scale: [1.0, 1.0]
Max Temporal Keep: 1.0

Meta Configuration

Seed: 234
Evaluation Frequency: Every 100 epochs
Use SDPA: true
Data Type: float16

Model Configuration

Model Name: vit_small
Predictor Depth: 12
Predictor Embedding Dimension: 384
Uniform Power: true
Use Mask Tokens: true
Zero Initialize Mask Tokens: true

Optimization

Iterations per Epoch: 300
IPE Scale: 1.25
Gradient Clipping: 10.0
Weight Decay: 0.04
Final Weight Decay: 0.4
Epochs: 500
Warmup: 40
Start Learning Rate: 0.0004
Learning Rate: 0.000825
Final Learning Rate: 1.0e-06
Exponential Moving Average: [0.998, 1.0]

Questions:

Has anyone else encountered similar issues when training on custom datasets, particularly with video data?
Are there recommended strategies for adjusting the training regimen or model configuration that might stabilize the loss?
Could this be related to the specific characteristics of video data in the custom dataset that might require different handling or preprocessing?

Any insights or suggestions would be greatly appreciated. Thank you for your support!

Best regards,

@MidoAssran

icekang commented 5 months ago

Hi,

I have the same problem. Although, I resume from the lastest vjepa at epoch 300 (Plot of jepa-loss)

However, looking at ~~regression~~ regularization loss, it seems to be continually optimized over-time.

DaBihy commented 5 months ago

@icekang thank you for you comment, I can confirm that I have the same thing for reg loss: Screenshot 2024-06-17 at 15 43 01

The model is learning even though the JEPA loss is increasing. It's counterintuitive, but I think it's normal behavior for such frameworks as I observe the same thing when training BYOL.

icekang commented 5 months ago

Sorry, it was not regression loss, it is regularization loss regarding the variance of the predicted vector Anyway, I think I should all be decreasing, especially jepa loss which indicates that the predicted feature vector is close to the actual feature vector

zetaSaahil commented 2 months ago

I had a very similar behavior with my custom dataset. However, I figured that with JEPA (while training with a small subset of the training data), the loss first increases, and then decreases after some time. I also realised that the learning rate should be high enough for the loss to overcome this local minima (for me, lr of 1e-3 for the small dataset, and 6e-4 for the big dataset worked the best).

facebookresearch / jepa