Use on HuggingFace API Inference

whitead commented 2 years ago

I'm trying to modify the python code here to work with the HuggingFace Inference API. I'm using a simple example, but I'm not getting the correct response. My prompt for infill is:

print("Hello W<|mask:0|>!")<|mask:0|>

and I'm sending it (trying to follow arguments used in python example from this repo) with:

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = f"https://api-inference.huggingface.co/models/facebook/incoder-1B"
data = dict(inputs='print("Hello W<|mask:0|>!")<|mask:0|>', 
            options=dict(use_gpu=False, use_cache=False, wait_for_model=True),
            parameters=dict(return_full_text = False, temperature=0.5, top_p=0.95, do_sample=True))
web_response = requests.request(
    "POST", API_URL, headers=headers, data=json.dumps(data))
response = json.loads(web_response.content.decode("utf-8"))
print(response)

The infills vary, but are something like '\n\n<|/<|/<|/<|/\n\n<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/<|/' or !!!!!!!!!!!! or \n\n\n\n\n\n. The demo page gives reasonable values, like ho. Confusingly, never orld though. Adding a metadata hint (<| file ext=.py |>) creates worse infills and adding the extra mask <|mask:1|> does not help.

I wonder if any of the maintainers have an idea of what could cause this. I've opened the same issue on huggingface forum , since there may be a problem with the inference API rather than my usage of the model.

dpfried commented 2 years ago

Thanks for catching this! I followed up on that issue in the huggingface forum; and crossposting here:

This seems to be because the inference API uses a text-generation pipeline, which doesn’t seem to add the special <|endoftext|> token that should be at the start of every prompt (https://github.com/huggingface/transformers/blob/215e0681e4c3f6ade6e219d022a5e640b42fcb76/src/transformers/pipelines/text_generation.py#L179). (The tokenizer we’re using will add them by default, unless this add_special_tokens flag is set to false.)

I’m not sure whether that’s intended behavior for the pipeline or not, but as a workaround you should be able to pass prefix='<|endoftext|>' in parameters, e.g.

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = f"https://api-inference.huggingface.co/models/facebook/incoder-1B"
data = dict(inputs='print("Hello W<|mask:0|>!")<|mask:0|>', 
            options=dict(use_gpu=False, use_cache=False, wait_for_model=True),
            parameters=dict(return_full_text = False, temperature=0.5, top_p=0.95, do_sample=True, prefix="<|endoftext|>"))
web_response = requests.request(
    "POST", API_URL, headers=headers, data=json.dumps(data))
response = json.loads(web_response.content.decode("utf-8"))
print(response)

The reason this example will produce some unexpected text like “Hello Who Are You” rather than “Hello World” is because “Hello World” is probably an entire complete token in our vocabulary (since we trained the tokenizer to allow multi-word tokens, and this is probably a common-enough phrase that the tokenizer learns a single token for it), so the token ids for "Hello W" are not a prefix of the token ids for "Hello World"). You'll likely get the best results for infills when trying to infill entire lines, or multiple lines (as we didn't allow token merges across newlines).

whitead commented 2 years ago

Yup, this solved my issue thank you!

dpfried / incoder

Use on HuggingFace API Inference #3