and the code above is what you implemented at akt.py.
The point is that I think AKT model has a chance to know the target answers with "f(c_t, r_t) variation vector" (at the paper), which is "qa_embed_diff_data" (at your code). In my opinion, this is related to already-known target issue.
To resolve the issue, I carefully suggest modifying Architecture forward function as the following code:
else: # dont peek current response
pad_zero = torch.zeros(batch_size, 1, x.size(-1)).to(self.device)
q = x
k = torch.cat([pad_zero, x[:, :-1, :]], dim=1)
v = torch.cat([pad_zero, y[:, :-1, :]], dim=1)
x = block(mask=0, query=q, key=k, values=v, apply_pos=True)
flag_first = True
Hello, I want to ask your opinion on the AKT model architecture.
the image above is the figure of AKT model represented in your paper
and the code above is what you implemented at akt.py.
The point is that I think AKT model has a chance to know the target answers with "f(c_t, r_t) variation vector" (at the paper), which is "qa_embed_diff_data" (at your code). In my opinion, this is related to already-known target issue.
To resolve the issue, I carefully suggest modifying Architecture forward function as the following code:
thank you for your attention :)