UX-Decoder / Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Apache License 2.0
4.31k stars 382 forks source link

what does `grounding_hash ` mean? #126

Open xianshunw opened 6 months ago

xianshunw commented 6 months ago

Hello everyone,

I’m relatively new to the field of Referential Segmentation (RefSeg) and am trying to enhance my understanding of the code and concepts. I came across the term grounding_hash in a particular code snippet and would appreciate it if someone could shed light on its usage and purpose within the context of this code segment.

Here’s the specific section of code that has piqued my curiosity:

# compute t2i loss
loss_grd_ce = 0
for b in range(len(indices)):
    task = targets[b]['grounding_task']
    pred_logit = outputs["pred_logits"][b]
    gt_logit = torch.zeros_like(pred_logit)
    select_idx = torch.stack((indices[b][0], indices[b][1])).tolist()
    gt_logit[select_idx] = 1
    t_hash = torch.tensor(targets[b]['grounding_hash'], device=gt_logit.device)
    hash_table = torch.zeros((len(t_hash), len(t_hash)), device=gt_logit.device)
    for idx in range(0, len(hash_table)):
        hash_table[idx][t_hash==t_hash[idx]] = 1
    hash_table = hash_table / hash_table.sum(-1, keepdim=True)
    gt_logit = gt_logit @ hash_table
    loss_grd_ce += self.grounding_weight[task]*torch.sum(-gt_logit.t() * F.log_softmax(pred_logit.t(), dim=-1), dim=-1).mean()
loss_grd_ce = loss_grd_ce / len(indices)

Could someone please explain the value of grounding_hash within the context of this loss computation? Additionally, as my dataset does not include such hash codes, I’ve omitted this component from my code. Since making this change, I’ve noticed that the loss_grounding_ce has stopped converging. Could the removal of the hash-related code be the reason for this non-convergence?

Thank you for taking the time to assist a newcomer to this field.

zhi-xuan-chen commented 2 weeks ago

Hello! I want to ask whether the grounding task is just the text referring segmentation task?