API rate limit error for non signed in users

rishiraj commented 3 hours ago

When usage of model is high we get this error if not signed in:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_http.py:406, in hf_raise_for_status(response, endpoint_name)
    405 try:
--> 406     response.raise_for_status()
    407 except HTTPError as e:

File /opt/conda/lib/python3.10/site-packages/requests/models.py:1024, in Response.raise_for_status(self)
   1023 if http_error_msg:
-> 1024     raise HTTPError(http_error_msg, response=self)

HTTPError: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-3B-Instruct/v1/chat/completions

The above exception was the direct cause of the following exception:

HfHubHTTPError                            Traceback (most recent call last)
Cell In[4], line 16
     11 messages = [
     12   {"role": "user", "content": f"{prompt}"}
     13 ]
     15 # Call the language model to generate the commit message
---> 16 chat_completion = client.chat.completions.create(
     17   model="meta-llama/Llama-3.2-3B-Instruct",
     18   messages=messages,
     19   max_tokens=100,
     20   temperature=0.5,
     21 )
     23 # Return the generated commit message
     24 print(chat_completion.choices[0].message.content.strip())

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/inference/_client.py:842, in InferenceClient.chat_completion(self, messages, model, stream, frequency_penalty, logit_bias, logprobs, max_tokens, n, presence_penalty, response_format, seed, stop, temperature, tool_choice, tool_prompt, tools, top_logprobs, top_p)
    821 payload = dict(
    822     model=model_id,
    823     messages=messages,
   (...)
    839     stream=stream,
    840 )
    841 payload = {key: value for key, value in payload.items() if value is not None}
--> 842 data = self.post(model=model_url, json=payload, stream=stream)
    844 if stream:
    845     return _stream_chat_completion_response(data)  # type: ignore[arg-type]

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/inference/_client.py:305, in InferenceClient.post(self, json, data, model, task, stream)
    302         raise InferenceTimeoutError(f"Inference call timed out: {url}") from error  # type: ignore
    304 try:
--> 305     hf_raise_for_status(response)
    306     return response.iter_lines() if stream else response.content
    307 except HTTPError as error:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_http.py:477, in hf_raise_for_status(response, endpoint_name)
    473     raise _format(HfHubHTTPError, message, response) from e
    475 # Convert `HTTPError` into a `HfHubHTTPError` to display request information
    476 # as well (request id and/or server error message)
--> 477 raise _format(HfHubHTTPError, str(e), response) from e

HfHubHTTPError: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-3B-Instruct/v1/chat/completions (Request ID: Dn89an59r3C26gPSi25RU)

Please log in or use a HF access token

ariG23498 commented 3 hours ago

I am not sure how we should handle this. The error is self-explanatory I think, but do you think a log to the console with the message "You are not logged in to HF, please login to use the CLI effortlessly" would be a good message prior to an error?

I am open to suggestions.

rishiraj commented 3 hours ago

I would like if automatic logging in to HF if api key is present in environment or letting users pass token as argument is enabled. What do you think?

ariG23498 commented 3 hours ago

Yep! That would be good to have.

Feel free to get a PR going.

ariG23498 / smart-commit

API rate limit error for non signed in users #1