HuggingFace Inference API cutting responses short

NormXU / nougat-latex-ocr

Codebase for fine-tuning / evaluating nougat-based image2latex generation models

https://arxiv.org/abs/2308.13418

Apache License 2.0

113 stars 12 forks source link

HuggingFace Inference API cutting responses short #2

Open lucasvanmol opened 6 months ago

lucasvanmol commented 6 months ago

There seems to be an issue running the model on huggingface (https://huggingface.co/Norm/nougat-latex-base), as responses seem to be cut short. Take for example this image:

$Screenshot 2023-11-29 222445$

The output is:

\mathrm{{QFT}}:|x\rangle\mapsto{\frac

I'm not sure if this is a temporary HF issue, or has the model been updated recently?

NormXU commented 6 months ago

@lucasvanmol I didn't update the model recently. Could you downgrade to transformers==4.34.0 and try again? It's odd because I trained the model with a max sequence length of 800. You can check the config file here. And I just ran the example in my environment (transformers==4.34.0 ), the output looks good:

\\mathrm{QFT}:|x\\rangle\\mapsto\\frac{1}{\\sqrt{N}}\\sum_{k=0}^{N-1}\\omega_{N}^{x k}|k\\rangle.

It is probably an HF issue?

lucasvanmol commented 6 months ago

Yeah, running the model myself it seems perfectly fine. It's just the HF API that's causing trouble, but I'm not sure what the cause is, as it wasn't happening a few days ago.

NormXU commented 6 months ago

@lucasvanmol Yeah, I reproduced the bug. Several examples I once tested successfully now returned a short result as well. This is definitely an hf issue.

lucasvanmol commented 6 months ago

I seem to have found a solution to this issue, but I'm still a bit baffled why this is suddenly needed. Adding in an extra max_new_tokens parameter gives the desired result:

import requests
import base64

API_URL = "https://api-inference.huggingface.co/models/Norm/nougat-latex-base"
headers = {"Authorization": f"Bearer {API_TOKEN}"}

def query(q):
    response = requests.request("POST", API_URL, headers=headers, json=q)
    return response.json()

image = open("test.png", "rb")
image_b64 = base64.b64encode(image.read())
data = query({"inputs" : image_b64.decode("utf-8") , "parameters" : {"max_new_tokens" : 800}})
print(data)
image.close()

I think you might be able to include this parameter in the inference widget by adding something like this to the model card in HF:

inference:
  parameters:
    max_new_tokens: 800

see https://huggingface.co/docs/hub/models-widgets#how-can-i-control-my-models-widget-inference-api-parameters

NormXU commented 6 months ago

@lucasvanmol I tried to add the parameters into the widget meta config, but it still fails to work. I open an issue here. Let's see what huggingface team thinks about it.

felixswang commented 6 months ago

Seems that all other checkpoints, including facebook/nougat-base, are having the same problem

Shobhit1201 commented 4 months ago

I have finetuned on my custom dataset for handwritten equations. After downloading the weights .pth file and using it on colab directly it shows the following errors:

Repository Not Found for url: https://huggingface.co/try.pth/resolve/main/config.json. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

OSError: try.pth is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>

Can someblody please help how to use the new model weights for inference

NormXU commented 4 months ago

@Shobhit1201

Please check this thread.

Shobhit1201 commented 4 months ago

thank you, it helps in inference. But if i need to use these weights for further training, what should i do? @NormXU

NormXU commented 4 months ago

@Shobhit1201 The codebase supports further training from a certain checkpoint. Please check the codes here

To utilize this feature, you need to specify the absolute path to the checkpoint in config.yaml file, which can be found here

Shobhit1201 commented 4 months ago

@NormXU this worked, thank you!! Now if i want to evaluate on my test set, and calculate token_acc and edit_dist, is there script for that as well?

NormXU commented 4 months ago

@Shobhit1201 Yes, there is. Check this. The eval func is used to calculate token_acc and edit_dis

hem210 commented 3 months ago

@lucasvanmol Thanks for that suggestion! The max_new_tokens parameter worked for me. I was able to retrieve a longer response.

Context: Previously I was using: max_tokens: 1000 which gave a truncated response. I replaced that with max_new_tokens: 1000 and it gave a better result.

I think this is related to how HuggingFace Inference API endpoint gives response. I am using HTTP Request to fetch the response. In the response, it adds the prompt as a prefix to the response. So, how I understand is, the prompt tokens also get counted in the response tokens. Might be the case that, max new tokens is used to count just the response tokens explicitly. So giving a longer response in turn.