Adapting ConvNetV2 for Time Series: Inconsistencies between Pre-training Visuals and Fine-Tuning Performance

I've adapted ConvNetV2 for time series signal analysis by setting the CNN height to 1. After pre-training and visualizing, I found that the prediction values exhibit a high level of randomness. In comparison with ViT-MAE, the reconstruction performance is quite suboptimal. However, during fine-tuning, both the training and validation accuracy and loss metrics outperform those of ViT-MAE. The performance on the test set, however, falls a bit short. Should I use a larger baseline version for pre-training?Do you have any other suggestions for me？

PS：1.ViT-MAE uses 16 layers of transformers，patch size is 4, ConvNext atto： depths=[2, 2, 6, 2], dims=[40, 80, 160, 320] 2.I haven't used any built-in training methods from TIMM; I've implemented a simple training loop using EMA, AdamW, and cosine decay.

Thank anyway

facebookresearch / ConvNeXt-V2

Adapting ConvNetV2 for Time Series: Inconsistencies between Pre-training Visuals and Fine-Tuning Performance #59