In the paper, it says that GENRE is trained on BLINK and all KILT data simultaneously. As we know, BLINK data and other data are not of the same order of magnitude. Are any strategies applied to balance the data? Or just mix all the data together for training?
Hi, thanks for the great work!
In the paper, it says that
GENRE is trained on BLINK and all KILT data simultaneously
. As we know, BLINK data and other data are not of the same order of magnitude. Are any strategies applied to balance the data? Or just mix all the data together for training?