Closed SummerGua closed 1 year ago
Regarding your questions:
Yes, the first response's value is not predicted as there is no historical records to depend.
The predicted results generated from the RNN's hidden states which capture the historical information of the sequence are predictors for the next question, e.g. the first predicted value whose index is 0 is the prediction of learner's response on next question whose index is 1 and the last predicted value is thrown away as there is no more response to be predicted.
The correct or incorrect answers are just two kinds of responses and we just use 1 standing for a incorrect answer and 0 standing for a correct answer to classify them in our implementation which will not affect the model's effectiveness as we keep it consistent when training and testing the model,.
It really helps! Thank you so much!
你的问题也是我的问题之一 ,终于解决了我的疑惑 。 我有一个疑问 ,比如说像是AKT模型,输入 q,qa。其中qa是对技能和答案的组合编码。比如说 在def forward(self,x):里面,x是 6450的二维矩阵,输出是6450*m的三维矩阵。这里的m必须是1吗?如果是1的话,可以调用squeeze(-1)压缩为二维,这样与输入的x在维度正好对应上。方便后续计算损失,但这里没有采取去除某个维度进行处理,是因为akt model模型的本身构造就避免预测当前问题时,泄露了当前问题的label(因为掩码)。我这么理解正确吗? 如果m非1的话,又该作何处理呢?因为我在写train()函数的时候,出现auc=0.999的情况,需要对pred和target做不同的时间步处理吗?
您好,来信已收到
你的问题也是我的问题之一 ,终于解决了我的疑惑 。 我有一个疑问 ,比如说像是AKT模型,输入 q,qa。其中qa是对技能和答案的组合编码。比如说 在def forward(self,x):里面,x是 64_50的二维矩阵,输出是64_50*m的三维矩阵。这里的m必须是1吗?如果是1的话,可以调用squeeze(-1)压缩为二维,这样与输入的x在维度正好对应上。方便后续计算损失,但这里没有采取去除某个维度进行处理,是因为akt model模型的本身构造就避免预测当前问题时,泄露了当前问题的label(因为掩码)。我这么理解正确吗? 如果m非1的话,又该作何处理呢?因为我在写train()函数的时候,出现auc=0.999的情况,需要对pred和target做不同的时间步处理吗?
你好,关于使用注意力做知识追踪时如何避免预测当前的问题,据我所知有以下几种方法: 设q_emb为问题或概念的embedding,qa_emb为组合的embedding
AKT:预测t时刻的使用t时刻的q_emb作为query,但是不使用其qa_emb做key、value,EduKTM/AKT/AKTNet.py
代码中也有所体现,可以搜索zero_pad
.
SAINT在第一个位置加了start token代替第一个qa
关于你提到的维度问题我不是很清楚。注意力的输出是(batch_size, seq_len, emb_size),经过nn.Linear(emb_size, 1)之后,每个步骤的结果都会变成一个数,再经过sigmoid成为预测值.
有需要的话我们可以加个vx一起讨论,base64:MTc2MjU2MDEyNjY=
感谢您在这么晚回复我的问题,我想我应该理解了。vx已加,太感谢了
Hi Dr. Tong,
There's a function named
process_raw_pred
inEduKTM/EduKTM/DKT/DKT.py
. https://github.com/bigdata-ustc/EduKTM/blob/c9912f0d29830b75b192bb63cdc5a4400f476300/EduKTM/DKT/DKT.py#L31-L37According to the codes below (line 56), we can learn that
process_raw_pred
is used to process the raw input and the output of the DKT model.https://github.com/bigdata-ustc/EduKTM/blob/c9912f0d29830b75b192bb63cdc5a4400f476300/EduKTM/DKT/DKT.py#L50-L58
I have three questions.
questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions
in line 32.[1:, 1]
here we goes from index 1, which means we throw away the first answer whose index is 0. Do you mean that the first value is not predicted and it is meaningless because it depends on no history answer records?pred = raw_pred[: length]
in line 34, here we goes from index 0. Why don't we throw away the first predicted value just like what we did in line 32? e.g.pred = raw_pred[1 : length+1]
About
truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions
in line 36, we use//
so that we can get 0 if non-zeros are in the first half part (correct answers), get 1 if non-zeros are in the second half part(wrong answers). However, according to theencode_onehot
function inexamples/DKT/prepare_dataset.ipynb
, correct answers are in the first half part and wrong answers are in the second half part. Also, 1 stands for a correct answer and 0 stands for a wrong answer.So
truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions
is not consistent with the encoding. To validate my thoughts, I ran the DKT example and print thetorch.nonzero(raw_question_matrix)
andtruth
, which are used to compare with the encoding result stored intest.txt
.Here are two screenshots which i got from console and
test.txt
,This shows my thought is right.
Then I added a line
truth = torch.tensor([1 if i == 0 else 0 for i in truth])
to test performence.The average AUC is about 0.73, equal to the moment before adding this line.
Sorry for the long description : ( .Your answers are appreciated!😀