coverage loss implementation might be wrong.

sanghyuk-choi commented 3 years ago

following abisee(2017), coverage attention equation is

but code implementation below, coverage seems containing t at timestep t, and it should be sum from 0 to t-1 https://github.com/OpenNMT/OpenNMT-py/blob/36748a5f280fbe781a86a82b7df85166796d49d7/onmt/decoders/decoder.py#L412-L415 so I think L413 would be replaced like this

if coverage is None:
    coverage = torch.zeros(p_attn.size(), device=p_attn.device)
else:
    coverage = coverage + attns["std"][-2]

https://github.com/OpenNMT/OpenNMT-py/blob/36748a5f280fbe781a86a82b7df85166796d49d7/onmt/utils/loss.py#L308-L311 and coverage loss would be replaced like this

def _compute_coverage_loss(self, std_attn, coverage_attn):
    covloss = torch.min(std_attn, coverage_attn).sum(dim=-1).view(-1)
    covloss *= self.lambda_coverage
    return covloss

# add ignoring pad at criterion
# coverage_loss[target==self.padding_idx] = 0

check if this correct and if necessary, I'll open a pull request. thank you.

vince62s commented 3 years ago

I think this is done on purpose, but @pltrdy may explain better.

pltrdy commented 3 years ago

It actually seems like a mistake to me, a^t being included in the summation we have min(a^t, c^t) = a^t which does not really make sense.

vince62s commented 1 year ago

@Sanghyuk-Choi if you're still around can you PR ?

sanghyuk-choi commented 1 year ago

Yes, I'll make PR within this week.

2022년 12월 8일 (목) 01:37, Vincent Nguyen @.***>님이 작성:

@Sanghyuk-Choi https://github.com/Sanghyuk-Choi if you're still around can you PR ?

—

OpenNMT / OpenNMT-py

coverage loss implementation might be wrong. #2013