Kardbord / hfapigo

Unofficial (Golang) Go bindings for the Hugging Face Inference API
MIT License
61 stars 5 forks source link

Adding llama models support #33

Closed petric3 closed 7 months ago

petric3 commented 7 months ago

Thank you for your program.

I was trying to do some sentiment analysis. I took your example and tried to switch the models, simply from hfapigo.RecommendedTextClassificationModel to meta-llama/Meta-Llama-3-8B, but the response is not returned (waiting for it indefinitely). I also tried to make it work with the llama models on other examples, but no response. Could you add an example on How to use the llama models via the HugFace API/Interface?

Kardbord commented 7 months ago

Hi @petric3, thanks for the issue! I think the problem you're seeing is caused by a couple of things.

The first is that meta-llama/Meta-Llama-3-8B is unfortunately not available for free (serverless) access via the Inference API. From the model page:

Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

The second issue is that the llama models seem to all be set up for text-generation rather than text-classification. You might try one of the other text-classification models hosted on HF: https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads

Thanks for letting me know that the example hangs indefinitely when given a llama model. I've opened #34 to track that issue. Unfortunately it may be awhile before I'm able to get to it. Life is busy right now with my day job and an upcoming house-move. :)

Kardbord commented 7 months ago

As for why llama models don't work with the text-generation example, it seems that they only accept single inputs, rather than lists of inputs.

# 1) Does not work
curl https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct  \
    -X POST \
    -d '{"inputs": ["The quick brown fox", "jumps over the lazy dog"]}'  \
    -H "Authorization: Bearer ${HUGGING_FACE_TOKEN}" \
    -H 'Content-Type: application/json'

# 2) Works
curl https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct  \
    -X POST \
    -d '{"inputs": "The quick brown fox"}'  \
    -H "Authorization: Bearer ${HUGGING_FACE_TOKEN}" \
    -H 'Content-Type: application/json'

# 3) Works
curl https://api-inference.huggingface.co/models/gpt2 \
    -X POST  \
    -d '{"inputs": ["The quick brown fox", "jumps over the lazy dog"]}'  \
    -H "Authorization: Bearer ${HUGGING_FACE_TOKEN}" \
    -H 'Content-Type: application/json'

Currently, hfapigo treats all inputs as lists because the docs specify them as being supported.

Return value is either a dict or a list of dicts if you sent a list of inputs

Clearly that isn't the case for all models though. I've opened #35 to track that issue.

petric3 commented 7 months ago

@Kardbord Thank you for the explanation, it makes sense. Also thanks for the links to the dedicated endpoints, may come handy. Best of wishes with your house move 🌞