ethz-spylab / satml-llm-ctf

Code used to run the platform for the LLM CTF colocated with SaTML 2024
https://ctf.spylab.ai
MIT License
25 stars 6 forks source link

Getting 401 "division by zero" on /api/v1/defense/{id}/evaluate_utility #11

Closed KrystofM closed 11 months ago

KrystofM commented 11 months ago

Not sure what happens, probably while calculating some average, I get divison by zero error.

dpaleka commented 11 months ago

What's the ID of the defense for which this happens?

KrystofM commented 11 months ago

After some exploration, found out the bug has to do with leaving an "invalid" api key string in the request.

The following request gives the 401:

curl -X 'POST' \
  'https://ctf.spylab.ai/api/v1/defense/657a4c97e345b2db74f14358/evaluate-utility' \
  -H 'accept: application/json' \
  -H 'X-API-Key: XXX' \
  -H 'Content-Type: application/json' \
  -d '{
  "api_keys": {
     "openai":""
  },
  "model": "openai/gpt-3.5-turbo-1106",
  "small": true
}'

Or leaving any other invalid api key, like "YOUR KEY" from the example; probably just some more informing response, than "division by zero", would be appropriate to solve the issue.

dpaleka commented 11 months ago

Thanks for helping! It is a bug indeed that we do not return an informative error here; in fact the same will probably happen if e.g. the OpenAI account is blocked. I'll try to fix this soon

dpaleka commented 11 months ago

Essentially we should just return avg_share_of_failed_queries=1 instead of an error, I guess

KrystofM commented 11 months ago

I would say returning an informative error with a 401 would be the sound course of action in this case. If calling OpenAI returns a 401, the message should be forwarded on the endpoint. More generally if OpenAI returns a 4xx error on the first request ( or on the first request that went through, in case of parallel load on the api ), I would just throw error and forward the given message. The proportion of failed requests in that case will always be =1.

dpaleka commented 11 months ago

This should be fixed now. Leaving open in case any new issues arise; otherwise will close in a few days.