Questions about `process_raw_pred` function in DKT.py

SummerGua commented 1 year ago

Hi Dr. Tong,

There's a function named process_raw_pred in EduKTM/EduKTM/DKT/DKT.py . https://github.com/bigdata-ustc/EduKTM/blob/c9912f0d29830b75b192bb63cdc5a4400f476300/EduKTM/DKT/DKT.py#L31-L37

According to the codes below (line 56), we can learn thatprocess_raw_pred is used to process the raw input and the output of the DKT model.

https://github.com/bigdata-ustc/EduKTM/blob/c9912f0d29830b75b192bb63cdc5a4400f476300/EduKTM/DKT/DKT.py#L50-L58

I have three questions.

I noticed that questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions in line 32. [1:, 1] here we goes from index 1, which means we throw away the first answer whose index is 0. Do you mean that the first value is not predicted and it is meaningless because it depends on no history answer records?
About pred = raw_pred[: length] in line 34, here we goes from index 0. Why don't we throw away the first predicted value just like what we did in line 32? e.g. pred = raw_pred[1 : length+1]

About truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions in line 36, we use // so that we can get 0 if non-zeros are in the first half part (correct answers), get 1 if non-zeros are in the second half part(wrong answers). However, according to the encode_onehot function in examples/DKT/prepare_dataset.ipynb, correct answers are in the first half part and wrong answers are in the second half part. Also, 1 stands for a correct answer and 0 stands for a wrong answer.

def encode_onehot(sequences, max_step, num_questions):
result = []

for q, a in tqdm.tqdm(sequences, 'convert to one-hot format: '): # e.g. q: [1,2,3]  a: [1,0,0]
    length = len(q)
    # append questions' and answers' length to an integer multiple of max_step
    mod = 0 if length % max_step == 0 else (max_step - length % max_step)
    onehot = np.zeros(shape=[length + mod, 2 * num_questions])
    print(length+mod)
    for i, q_id in enumerate(q):
        # if a[i]>0(correct answer)，index=question id(first half part)，else index=question id + question number(second half part)
        index = int(q_id if a[i] > 0 else q_id + num_questions)
        onehot[i][index] = 1 # correct answers are in the first half part
    result = np.append(result, onehot)

return result.reshape(-1, max_step, 2 * num_questions)

So truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions is not consistent with the encoding. To validate my thoughts, I ran the DKT example and print the torch.nonzero(raw_question_matrix) and truth, which are used to compare with the encoding result stored in test.txt.

Here are two screenshots which i got from console and test.txt,

viewfile

This shows my thought is right.

Then I added a line truth = torch.tensor([1 if i == 0 else 0 for i in truth]) to test performence.

add

The average AUC is about 0.73, equal to the moment before adding this line.

Sorry for the long description : ( .Your answers are appreciated!😀

xbh0720 commented 1 year ago

Regarding your questions:

Yes, the first response's value is not predicted as there is no historical records to depend.
The predicted results generated from the RNN's hidden states which capture the historical information of the sequence are predictors for the next question, e.g. the first predicted value whose index is 0 is the prediction of learner's response on next question whose index is 1 and the last predicted value is thrown away as there is no more response to be predicted.
The correct or incorrect answers are just two kinds of responses and we just use 1 standing for a incorrect answer and 0 standing for a correct answer to classify them in our implementation which will not affect the model's effectiveness as we keep it consistent when training and testing the model,.

SummerGua commented 1 year ago

It really helps! Thank you so much!

Tong198-Hu commented 1 year ago

你的问题也是我的问题之一，终于解决了我的疑惑。我有一个疑问，比如说像是AKT模型，输入 q,qa。其中qa是对技能和答案的组合编码。比如说在def forward(self,x):里面，x是 6450的二维矩阵，输出是6450*m的三维矩阵。这里的m必须是1吗？如果是1的话，可以调用squeeze(-1)压缩为二维，这样与输入的x在维度正好对应上。方便后续计算损失，但这里没有采取去除某个维度进行处理，是因为akt model模型的本身构造就避免预测当前问题时，泄露了当前问题的label（因为掩码）。我这么理解正确吗？如果m非1的话，又该作何处理呢？因为我在写train()函数的时候，出现auc=0.999的情况，需要对pred和target做不同的时间步处理吗？

shshen-closer commented 1 year ago

您好，来信已收到

SummerGua commented 1 year ago

你的问题也是我的问题之一，终于解决了我的疑惑。我有一个疑问，比如说像是AKT模型，输入 q,qa。其中qa是对技能和答案的组合编码。比如说在def forward(self,x):里面，x是 64_50的二维矩阵，输出是64_50*m的三维矩阵。这里的m必须是1吗？如果是1的话，可以调用squeeze(-1)压缩为二维，这样与输入的x在维度正好对应上。方便后续计算损失，但这里没有采取去除某个维度进行处理，是因为akt model模型的本身构造就避免预测当前问题时，泄露了当前问题的label（因为掩码）。我这么理解正确吗？如果m非1的话，又该作何处理呢？因为我在写train()函数的时候，出现auc=0.999的情况，需要对pred和target做不同的时间步处理吗？

你好，关于使用注意力做知识追踪时如何避免预测当前的问题，据我所知有以下几种方法：设q_emb为问题或概念的embedding，qa_emb为组合的embedding

AKT：预测t时刻的使用t时刻的q_emb作为query，但是不使用其qa_emb做key、value，EduKTM/AKT/AKTNet.py代码中也有所体现，可以搜索zero_pad.
SAINT在第一个位置加了start token代替第一个qa

关于你提到的维度问题我不是很清楚。注意力的输出是(batch_size, seq_len, emb_size)，经过nn.Linear(emb_size, 1)之后，每个步骤的结果都会变成一个数，再经过sigmoid成为预测值.

有需要的话我们可以加个vx一起讨论，base64：MTc2MjU2MDEyNjY=

Tong198-Hu commented 1 year ago

感谢您在这么晚回复我的问题，我想我应该理解了。vx已加，太感谢了

bigdata-ustc / EduKTM

Questions about `process_raw_pred` function in DKT.py #37