Closed shihabrashid-ucr closed 7 months ago
Hi! You're correct--we set a different threshold for each model (and relationship type). We use a small held out set to determine these thresholds using brute force search over possible thresholds (which is a simple and quick search problem).
Thank you for clarifying!
Thank you for this interesting paper. I just had one confusion regarding "popularity". In section 6.1 you mention
we use retrieval for questions whose popularity is lower than a threshold (popularity threshold), and for more popular entities, do not use retrieval at all.
Does this mean that during inference, depending on the question and thepopularity_threshold
, "every" LLM will either retrieve or not retrieve? If that is the case then in Section 6.2, when you mention,smaller LMs almost always retrieve
, how would small and big LLMs vary when deciding whether to retrieve or not?Is the threshold different for different LLMs? If yes, a little more insight into how you are calculating the threshold would be great!
TIA!