Error with train custom dataset

RyuJunHwan commented 6 years ago

I want to learn using the dataset I have. I have solved some problems, but the below error can not be solved.

The ground truth file in my dataset consists of (x, y, w, h, class_label) in *.txt file.

loss_c = loss_c.view(num, -1) loss_c[pos] = 0 # filter out pos boxes for now

Is there a problem here? error

this is source in multibox_loss.py error_source

If you give the option --cuda = False for confirmation, the following error occurs. File "/home/jhryu/Downloads/ssd.pytorch/layers/modules/multibox_loss.py", line 103, in forward loss_c = log_sum_exp (batch_conf) - batch_conf.gather (1, conf_t.view (-1, 1))

RuntimeError: Invalid index in gather at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/TH/generic/THTensorMath.c:600

Thank you for your reply.

RyuJunHwan commented 6 years ago

Self-replies.

Invalid indexing seems to be the problem. Fixed after modification according to shape of batch_conf as below. loss_c = log_sum_exp (batch_conf) - batch_conf.gather (0, conf_t.view (-1, 1))

One-way-Leon commented 5 years ago

hi I met the same problem with you, and solved the problem in your way. but there is a new problem I can't fix ; RuntimeError: Expected tensor [279424, 1], src [279424, 4] and index [279424, 1] to have the same size apart from dimension 0 ps: I used my own dataset
Thanks for your reply.

knit2knot commented 5 years ago

Hi,

I had the same issues. What I realised is that I forgot to add the background class. So add 1 to your class number and add "background" as the first label in your labels list.

Hope it helps.

NWChen commented 5 years ago

Note that <YOUR_CLASS>_CLASSES must contain (n+1) classes, if in config.py you list num_classes: n.

Then in multibox_loss.py you may gather along the 1st axis, like the default SSD code has it.

JianyuTANG commented 5 years ago

@One-way-Leon hello, I met the exactly same problem with you when doing with my dataset in format VOC, have you fixed it yet? Thanks a lot

ynrjh92 commented 5 years ago

Correct the above contents.

The self-replies that I have modified are solved as a temporary solution, but this is not a fundamental solution. Anyway, I attach the code I am currently using.

It should look like this: ``

def forward(self, predictions, targets):     
    # predictions (tuple) : A tuple (loc, conf, prior boxes) from SSD net
        loc   : [batch_size, num_priors, 4]
        conf  : [batch_size, num_priors, num_classes]
        prior : [num_priors, 4]
    # targets : Ground truth boxes and labels for a batch,
        [num_objs, 5],
        [x1,y1,x2,y2,class] format

    loc_data, conf_data, priors = predictions
    num = loc_data.size(0)
    priors = priors[:loc_data.size(1), :]
    num_priors = (priors.size(0))
    num_classes = self.num_classes

    # match priors (default boxes) and ground truth boxes
    loc_t = torch.Tensor(num, num_priors, 4)
    conf_t = torch.LongTensor(num, num_priors)
    for idx in range(num):
        truths = targets[idx][:, :-1].data
        labels = targets[idx][:, -1].data
        defaults = priors.data
        match(self.jaccard_thresh, truths, defaults, self.variance, labels,
              loc_t, conf_t, idx)
    loc_t = loc_t.cuda()
    conf_t = conf_t.cuda()

    # [batch, num_priors]
    pos = conf_t > 0
    num_pos = pos.sum(dim=1, keepdim=True)

    # Localization Loss (Smooth L1)
    # Shape: [batch,num_priors,4]
    pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
    loc_p = loc_data[pos_idx].view(-1, 4)
    loc_t = loc_t[pos_idx].view(-1, 4)
    loss_l = F.smooth_l1_loss(loc_p, loc_t, size_average=False)

    # Compute max conf across batch for hard negative mining
    batch_conf = conf_data.view(-1, self.num_classes)
    loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))

    # Hard Negative Mining
    # [batch, num_priors]
    loss_c = loss_c.view(num, -1)
    nonzero_pos = pos.nonzero()

    loss_c[pos] = 0  # filter out pos boxes for now

    _, loss_idx = loss_c.sort(1, descending=True)
    _, idx_rank = loss_idx.sort(1)
    num_pos = pos.long().sum(1, keepdim=True)
    num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1)
    neg = idx_rank < num_neg.expand_as(idx_rank)

    # Confidence Loss Including Positive and Negative Examples
    pos_idx = pos.unsqueeze(2).expand_as(conf_data)
    neg_idx = neg.unsqueeze(2).expand_as(conf_data)

    # .gt(input) computes input > other element-wise.
    conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1, self.num_classes)
    targets_weighted = conf_t[(pos+neg).gt(0)]

    loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)

    # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
    N = num_pos.data.sum().double()
    loss_l = loss_l.double()
    loss_c = loss_c.double()

    loss_l /= N
    loss_c /= N
    return loss_l, loss_c

``

If the problem occurs in this code, it may be a data format issue (for example, use num_class + 1 to include a background class). Or it might be a version of the framework.

I am concerned that I will provide inaccurate information, so I encourage you to reference the various Github issues.

cristyioan2000 commented 5 years ago

I had the same error, the problem was that I forgot to filter some classes ids from the labels. I encountered the same error twice for different reasons. It was always about the format of the labels file or the wrong path to the images.

fatemeakbari commented 4 years ago

@One-way-Leon @JianyuTANG hi, I had the same problem, I changed the code as follows(line 94 of multibox_loss.py): conf_tt = conf_t.view(-1,1) conf_tt_index = (conf_tt != 0).nonzero() conf_tt[conf_tt_index] = 1 loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_tt)

LinYuanNYU commented 4 years ago

My data is in VOC format, and I see the problems above. In the end, I solved this problem by remove the 'background' class I added into VOC_CLASSES, but when calculate the number of the classes, the background class should be included. In other words, the num_classes of voc in voc0712.py should be Len(VOC_CLASSES)+1

brijhub commented 4 years ago

@One-way-Leon Have

hi I met the same problem with you, and solved the problem in your way. but there is a new problem I can't fix ; RuntimeError: Expected tensor [279424, 1], src [279424, 4] and index [279424, 1] to have the same size apart from dimension 0 ps: I used my own dataset Thanks for your reply.

I'm also facing the same issue, @One-way-Leon Have you resolved this issue ?

amdegroot / ssd.pytorch

Error with train custom dataset #161