facebookarchive / fb.resnet.torch

Torch implementation of ResNet from http://arxiv.org/abs/1512.03385 and training scripts
Other
2.29k stars 664 forks source link

Using cifar-100 with 15 classes #203

Open YotYot opened 6 years ago

YotYot commented 6 years ago

Hi,

I'm trying to classify my images to 15 classes, and use cifar-100 for that. I'm using the following command -

th main.lua -data -nClasses 15 -resetClassifier true -dataset cifar100 -depth 22

and I get the following error -

Assertiont >= 0 && t < n_classes failed.

I don't get this with any of the other datasets (cifar-10, imagenet)

Any clue, someone?

Thanks! Yotam

onidzelskyi commented 6 years ago

You've missed an argument after -data parameter. Your command should looks like

th main.lua -data /dev/null -nClasses 15 -resetClassifier true -dataset cifar100 -depth 22

YotYot commented 6 years ago

Hi @onidzelskyi ,

Thanks for your answer, you're right of course but this isn't the problem - I just didn't specify the path to the data, but I was using it.

Running the same command with cifar10 works ok.

Thanks, Yotam

onidzelskyi commented 6 years ago

Yes, you right - I've the same issue when trying to train with small #classes (3 classes in my case) - it gives the same error you experienced with. Seems, for big networks (cifar100 for your case and resent-200 in my case) #classes should be equals or more than some threshold value. To check it out try to increment #classes for cifar100 model and let me know if it make any positive effect. Regards, Oleksii

aabobakr commented 6 years ago

The option -resetClassifier replaces the output layer of the original model with a new output layer with the -nClasses you provide. So, in your example it will create a new network with 15 output neurons, and you are training on cifar100 which has 100 classes. The assertion fails as the output and target should be the same size for the loss function to be evaluated.

onidzelskyi commented 6 years ago

Fix me if I'm on wrong way. To train on own dataset with custom #classes

  1. Overload datasets/imagenet.lua and datasets/imagenet-gen.lua for our dataset (e.g. )
  2. Train model with th main.lua -data <path to dataset directory> -resetClassifier true -nClasses <#classes> -dataset <custom dataset name>

But I get an error unknown dataset: <custom dataset name>

I have no idea how to adopt it to my own dataset

aabobakr commented 6 years ago

You don't need to set the -dataset argument and only the -data argument should contain the path to your dataset. Your dataset directory must be organised as follows:

/dataset
  |---> /train
           |--> /class#1
           |--> /class#2
  |---> /val
           |--> /class#1
           |--> /class#2
onidzelskyi commented 6 years ago

th main.lua -data /home/alex/test_car_dataset/ -resetClassifier true -nClasses 3

gives an error

=> Creating model from file: models/resnet.lua | ResNet-34 ImageNet => Replacing classifier with 3-way classifier => Generating list of images | finding all validation images | finding all training images | saving list of images to /home/alex/fb.resnet.torch/gen/imagenet.t7 => Training epoch # 1 /home/alex/torch/extra/cunn/lib/THCUNN/ClassNLLCriterion.cu:57: void > cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dt$pe *, int, int, int, int, > long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < > n_classes failed.

...

THCudaCheck FAIL file=/home/alex/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=32 > error=59 : device-side assert triggered /home/alex/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at > /home/alex/torch/extra/cutorch/lib/THC/generic/THCStora ge.c:32 stack traceback: [C]: at 0x7ff5ed0f5210 [C]: in function '__index' ...lex/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:50: in function 'updateOutput' ...torch/install/share/lua/5.1/nn/CrossEntropyCriterion.lua:20: in function 'forward' ./train.lua:58: in function 'train' main.lua:52: in main chunk [C]: in function 'dofile' ...alex/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

dataset directory structure

ls -lat /home/alex/test_car_dataset/ val train

ls -lat /home/alex/test_car_dataset/train 1 3 2