Performance of the teacher network

ameyagodbole commented 2 years ago

Hi, I recently read your work and I'm very excited by your approach. Thanks for the awesome work!

I had some queries about your set-up: The teacher network also has a forward operation that can be used to solve the KBQA task i.e. it predicts an answer distribution.

Would it be possible to report the results on just the teacher network on these datasets?
If the teacher performs worse than the student, how can we explain the difference in performance? Is it that the teacher weights consistency of forward and backward reasoning more than trying to solve the actual task?
If the teacher works worse, how would hyperparameter tuning of teacher network affect the performance of both the teacher and student?

If you prefer some other medium of communication, please let me know. Thanks!

RichardHGL commented 2 years ago

Actually, the teacher network is trained with both forward and backward reasoning. The forward part can be used to make prediction on dev/test set, while the backward part can't.
We did find that teacher network (forward part) is worse than the student. The constraint with backward reasoning as a part of objective function may also introduce some noise that degrades the KBQA performance of forward part. While combing intermediate predictions from both direction may help reduce such noise. So a student model which only focuses on forward reasoning may perform better.
We haven't thought about it. You may give it a try.

ameyagodbole commented 2 years ago

Point 1 and 2 was what I was most concerned with. Thank you for the clarification.

RichardHGL / WSDM2021_NSM

Performance of the teacher network #8