Closed kupietz closed 3 years ago
I expect that's the weird metric I thought was wrong. I guess it's a bug in Krill - could you give me a hint on how the request looks like, @margaretha ?
Direct benchmark shows a performance improvement - as expected: https://korap.ids-mannheim.de/gerrit/c/KorAP/Krill/+/4540 , so I guess there is a problem with the caller.
Direct benchmark shows a performance improvement - as expected: https://korap.ids-mannheim.de/gerrit/c/KorAP/Krill/+/4540 , so I guess there is a problem with the caller.
The caller in Kustvakt?
We were able to golf it down to the VC - it's not really the function that is slower, it's the difference in the query. The scope including the rewrite is way smaller. But we have no real idea, why it is so slow, as the cutoff still limits the search to a few hits. Maybe it's the regex in the dictionary that's so slow ....
I'd give this a high priority. Even if this was only factor 1.6 it'd be worth a couple of k€ in hardware.
An interim solution, however, could be to turn off access-rewrite-disabled
by default in the client libraries, at least if the user has a token.
But when do you set this option? Maybe there is a misunderstanding what this option does? Maybe there is a more appropriate option to use then disabling the access rewrite?
However - I agree we should find out the origin of the slow down. I guess it's either the dictionary lookup or the way we build the VCs.
But when do you set this option? Maybe there is a misunderstanding what this option does? Maybe there is a more appropriate option to use then disabling the access rewrite?
I think it is most useful when users do not have tokens, so they can get the metadata instead.
An interim solution, however, could be to turn off
access-rewrite-disabled
by default in the client libraries, at least if the user has a token.
This solution makes much sense.
But it alters the scope of the query. It is no "ignore the metadata" option. And in case the user has a token, the scope would be identical, right? Can you benchmark this? I would expect it would have a similar timing.
We introduced this option for the R client, I think. I also suspect the pre reduced scope without token is the source of the speed up. I thought I had a token when I discovered this, but in fact I had not, due to a bug in the R library.
If it is like this we can't circumvent the problem in the client and it might be not so easy to fix.
Eliza expanded the query based on the assumption there was no token involved, that's what we tested.
There were two features introduced here: The option to bypass access rewrite and the option to not fetch snippets. Not fetching snippets is always faster, but bypassing access rewrite can mean that the search scope is larger - which is in principal slower, but shouldn't be that much slower due to the cutoff. I think there are very specific use cases for these options, but I don't think that bypassing the access rewrite is a good default ...
After further investigation (thanks @margaretha !), we can assume that the problem is in the way the VC is created before iterating through the result. There are two code changes helpful here:
Regarding 2: The truth is - I thought it is, but I guess I was wrong. I'll have to investigate further in how to accomplish that.
Sorry, with a valid access token the gain shrinks to 0. I wrongly assumed, I had one, because of a bug in RKorAPClient (here: https://github.com/KorAP/RKorAPClient/commit/01c2477e36af2a9b91fd6092606ff5ccdb23bba2 ).
Prints something like: