ViTAE-Transformer / Remote-Sensing-RVSA

The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"
MIT License
407 stars 31 forks source link

Problems in the use of pre training model #9

Closed andytianph closed 1 year ago

andytianph commented 1 year ago

Hello, first of all, thank you for your amazing results and grateful for providing the code! After changing the network model, using the pre-trained model (tried both vit-base and vitae-base pre-training models) for training, many parameters cannot be matched, resulting in the final mAP drop (lower than the unmodified network structure). I wonder whether the decrease in mAP is caused by changing the network structure or the lack of pre-training weight parameters? I wonder if it is necessary to re-do pre-training on Million AID after changing the network structure? Due to the large time cost and equipment cost of pre-training, it has not been tried.

DotWang commented 1 year ago

@andytianph

Sorry, I cannot answer it since I don't know which model you modified or how you modified it.

Perhaps both structure and pretraining weights have an effect.

andytianph commented 1 year ago

非常感谢您的回答!我主要尝试修改了网络结构中的卷积模块,但怎么改都不如原始vit和conv组合的效果。 好像确实不能确定是网络结构的问题还是预训练权重的问题,但还是很感谢您的解答! 目前在尝试用修改完的网络用MAE的方式做pretrain,排除一下。 但我好像没找到你们对millionAID的train valid的图片划分(当前repo和[ViTAE-Transformer-Remote-Sensing]repo中都没有),请问方便提供一下train_labels.txt 和 valid_labels.txt吗?非常感谢!! image

DotWang commented 1 year ago

@andytianph 首先这个训练集和验证集的划分的是针对监督预训练,见RSP那篇文章。ViT-RVSA是无监督训练,为了保证和之前工作的可比较性我们还是拿划分的这个95w张的训练集来预训练的,不过我们评估MAE训练的结果是在那几个分类小数据集上评估的,见文章Section-III-A,没有在划分的5w张的验证集上搞。当然了其实你拿所有图片训其实也没问题

这俩txt我不太方便给你,参见ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing#10

andytianph commented 1 year ago

好的,非常感谢您的解答!! 那我先尝试用millionAID所有图片做无监督训练试一下,再次感谢!