Open YotYot opened 6 years ago
You've missed an argument after -data parameter. Your command should looks like
th main.lua -data /dev/null -nClasses 15 -resetClassifier true -dataset cifar100 -depth 22
Hi @onidzelskyi ,
Thanks for your answer, you're right of course but this isn't the problem - I just didn't specify the path to the data, but I was using it.
Running the same command with cifar10 works ok.
Thanks, Yotam
Yes, you right - I've the same issue when trying to train with small #classes (3 classes in my case) - it gives the same error you experienced with. Seems, for big networks (cifar100 for your case and resent-200 in my case) #classes should be equals or more than some threshold value. To check it out try to increment #classes for cifar100 model and let me know if it make any positive effect. Regards, Oleksii
The option -resetClassifier replaces the output layer of the original model with a new output layer with the -nClasses you provide. So, in your example it will create a new network with 15 output neurons, and you are training on cifar100 which has 100 classes. The assertion fails as the output and target should be the same size for the loss function to be evaluated.
Fix me if I'm on wrong way. To train on own dataset with custom #classes
th main.lua -data <path to dataset directory> -resetClassifier true -nClasses <#classes> -dataset <custom dataset name>
But I get an error
unknown dataset: <custom dataset name>
I have no idea how to adopt it to my own dataset
You don't need to set the -dataset
argument and only the -data
argument should contain the path to your dataset. Your dataset directory must be organised as follows:
/dataset
|---> /train
|--> /class#1
|--> /class#2
|---> /val
|--> /class#1
|--> /class#2
th main.lua -data /home/alex/test_car_dataset/ -resetClassifier true -nClasses 3
gives an error
=> Creating model from file: models/resnet.lua | ResNet-34 ImageNet => Replacing classifier with 3-way classifier => Generating list of images | finding all validation images | finding all training images | saving list of images to /home/alex/fb.resnet.torch/gen/imagenet.t7 => Training epoch # 1 /home/alex/torch/extra/cunn/lib/THCUNN/ClassNLLCriterion.cu:57: void > cunn_ClassNLLCriterion_updateOutput_kernel(Dtype , Dtype , Dtype , long , Dt$pe *, int, int, int, int, > long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion
t >= 0 && t < > n_classes
failed....
THCudaCheck FAIL file=/home/alex/torch/extra/cutorch/lib/THC/generic/THCStorage.c line=32 > error=59 : device-side assert triggered /home/alex/torch/install/bin/luajit: cuda runtime error (59) : device-side assert triggered at > /home/alex/torch/extra/cutorch/lib/THC/generic/THCStora ge.c:32 stack traceback: [C]: at 0x7ff5ed0f5210 [C]: in function '__index' ...lex/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:50: in function 'updateOutput' ...torch/install/share/lua/5.1/nn/CrossEntropyCriterion.lua:20: in function 'forward' ./train.lua:58: in function 'train' main.lua:52: in main chunk [C]: in function 'dofile' ...alex/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50
dataset directory structure
ls -lat /home/alex/test_car_dataset/ val train
ls -lat /home/alex/test_car_dataset/train 1 3 2
Hi,
I'm trying to classify my images to 15 classes, and use cifar-100 for that. I'm using the following command -
th main.lua -data -nClasses 15 -resetClassifier true -dataset cifar100 -depth 22
and I get the following error -
Assertiont >= 0 && t < n_classes failed.
I don't get this with any of the other datasets (cifar-10, imagenet)
Any clue, someone?
Thanks! Yotam