apple / ml-cvnets

CVNets: A library for training computer vision networks
https://apple.github.io/ml-cvnets
Other
1.76k stars 225 forks source link

crash if different number of classess within `train/test` set #90

Closed mjamroz closed 11 months ago

mjamroz commented 1 year ago

Having different number of classes in train and val directory for - at least - image classification, results in cuda crashing with cryptic error code when training custom dataset:

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:365: operator(): block: [0,0,0], thread: [0,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.         
2023-08-26 09:42:29 - LOGS    - Exception occurred that interrupted the training:  
[...]

To avoid it, be sure you have the same number of classes for test and train. I'm checking it by:

import os
from glob import glob
from sys import argv

def cname(path):
    return os.path.split(path)[-1]

test_classes = set(map(cname, glob(os.path.join(argv[1], "test", "*"))))
train_classes = set(map(cname, glob(os.path.join(argv[1], "train", "*"))))
if missing := test_classes - train_classes:
    print("MISSING TRAIN CLASS(ES)", missing)

if missing := train_classes - test_classes:
    print("MISSING TEST CLASS(ES)", missing

Would be cool that ml-cvnets implements it at loading time - as already implemented empty directories check (i.e. script ends if there is no files for particular classes).

mjamroz commented 1 year ago

BTW, if you wanna split dataset into train/test that it shares all classes, datasets module is useful for that:

from datasets import load_dataset

ds = load_dataset(
    "imagefolder",
    data_files={"train": "/PATH/**"},
    split="train",
)
ds = ds.train_test_split(
    test_size=0.05, stratify_by_column="label", shuffle=True
)
ds["test"].to_csv("test.csv")
ds["train"].to_csv("train.csv")

and then you should link/copy files as in {test,train}.csv

Tranbaber commented 11 months ago

@mjamroz Hello! I'm trying to train MobileViT model, but I'm having the following problem and am asking for help

File "C:\Users\72344.conda\envs\MobileViTv2\Scripts\cvnets-train.exemain.py", line 4, in ModuleNotFoundError: No module named 'main_train'

And I tried to download this module, but show "ERROR: Could not find a version that satisfies the requirement main_train (from versions: none) ERROR: No matching distribution found for main_train"

Can you tell what can I do? Thank you very much!

mjamroz commented 11 months ago

@Tranbaber try to run python main_train.py within ml-cvnets directory, for example python -W ignore main_train.py --common.config-file path_to_config_file.yaml

Tranbaber commented 11 months ago

@mjamroz Hello! I have a problem after following your tips. AttributeError : 'NoneType' object has no attribute 'size' Can you teach me what I should do?

mjamroz commented 11 months ago

I would recommend you to try "huggingface" - it seems to be easier for beginners use, and it implements mobilevit.

Regarding your error it simply means some variable hasnt been defined, but you skip the most important lines of an error message - which line and variable.

Instead of your custom yaml file, try to use one from example dir

Tranbaber commented 11 months ago

@mjamroz Thanks! I will try you recommendations later!Thanks again!