[request] Improvements to dataset code for pretraining

facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Apache License 2.0

9.32k stars 829 forks source link

[request] Improvements to dataset code for pretraining #243

Open patricklabatut opened 1 year ago

patricklabatut commented 1 year ago

Only depend on standard ImageNet files
Facilitate swapping in a different dataset

Related issues:

BrianPulfer commented 1 year ago

Here's a labels.txt file that works out of the box. Only "issue" with this is that commas in the class name are removed. If it is of interest, I can open a PR :)

ayushnangia commented 5 months ago

so I might be a bit dumb but how to just pretrain the old weights of image net with my custom dataset. or do i need to train from scratch with my custom dataset @patricklabatut

wyh196646 commented 2 weeks ago

so I might be a bit dumb but how to just pretrain the old weights of image net with my custom dataset. or do i need to train from scratch with my custom dataset @patricklabatut

I have same problem，and there is a tutorial for training my own dataset via dinov2 self supervised learning method？

xinli2008 commented 5 days ago

@wyh196646 have you soloved the problem? pretraining dinov2 on my own dataset without label

1921134176 commented 5 days ago

A simple approach is that you can organize your data like an image classification task, where the data root contains a series of category subdirectories under which the images you want to train on are located. Then configure the dataset type inside to be changed to ImageFolder, e.g. dataset_path: ImageFolder:root=/ssd/cx_dataset/flowerphotos. then in the loaders.py script from torchvision.datasets import ImageFolder and add elif name == “ImageFolder”: class = ImageFolder on line 60.

facebookresearch / dinov2

[request] Improvements to dataset code for pretraining #243

233

126

100

72