1rgs / jsonformer

A Bulletproof Way to Generate Structured JSON from Language Models
MIT License
4.4k stars 153 forks source link

Issue with array response #46

Open rhzs opened 9 months ago

rhzs commented 9 months ago

Hi,

I have issue with the generated JSON response. It seems that it doesn't respond well with array related prompt instruction.

from transformers import AutoModelForCausalLM, AutoTokenizer

print("Loading model and tokenizer...")
model_name = "databricks/dolly-v2-3b"
model = AutoModelForCausalLM.from_pretrained(model_name, use_cache=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, use_cache=True)
print("Loaded model and tokenizer")

Prompt:

from jsonformer.format import highlight_values
from jsonformer.main import Jsonformer

stock2 = {
  "type": "object",
  "properties": {
    "stocks": {
        "type": "array",
        "items": {"type": "string"}
      }
   }
}

builder = Jsonformer(
    model=model,
    tokenizer=tokenizer,
    json_schema=stock2,
    debug=True,
    prompt="generate 10 stocks code",
)

print("Generating...")

output = builder()

highlight_values(output)

Response:

Generating...
[generate_object] generating value for stocks
[generate_string] generate 10 stocks code
Output result in the following JSON schema format:
{"type": "object", "properties": {"stocks": {"type": "array", "items": {"type": "string"}}}}
Result: {"stocks": ["
[generate_string] |ABC",|
[generate_string] generate 10 stocks code
Output result in the following JSON schema format:
{"type": "object", "properties": {"stocks": {"type": "array", "items": {"type": "string"}}}}
Result: {"stocks": ["ABC", "
[generate_string] |XYZ",|
[generate_string] generate 10 stocks code
Output result in the following JSON schema format:
{"type": "object", "properties": {"stocks": {"type": "array", "items": {"type": "string"}}}}
Result: {"stocks": ["ABC", "XYZ", "
[generate_string] |PQR",|
{
  stocks: [
    "ABC",
    "XYZ",
    "PQR"
  ]
}

The response only respond with 3 data not 10 as in the prompt. I am not sure if it is issue with the model or not. Also, you may notice that the memory used for 3b model is at 23GB of RAM. Is this normal? Screenshot 2023-12-27 at 20 39 35

Any help would be appreciated. Thank you.

botka1998 commented 8 months ago

@rhzs Check this out, should solve your array issue #47