Validation Error during pydantic validation for Llama3 GGUF

Describe the issue as clearly as possible:

I'm currently attempting to summarize an article and classify the relevancy, which worked fine on outlines 0.0.36, however upgrading to outlines 0.0.43 produces a validation error which did not occur before.

I have tried:

Manually specifying the tokenizer to avoid any dictionary bugs
Reducing the complexity of the prompt to JUST summarization, in order to make a minimal example (I have more complicated use cases which worked in 0.0.36)
Tried other Function Calling fine-tuned models e.g https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B, but still exhibits the same issue

The model seems to be unable to generate valid json and there is an "Invalid control character at" bug that occurs during pydantic validation

notes: Running on #20~22.04.1-Ubuntu, AWS instance with A10G GPU, Cuda 12.1 llama_cpp_python==0.2.77 outlines==0.0.43

Steps/code to reproduce the bug:

from outlines import models, generate
import llama_cpp

from pydantic import BaseModel

# 0.0.36
# model = models.llamacpp(
#                         # "./models/bartowski_Meta-Llama-3-8B-Instruct-Q8_0.gguf",
#                         n_ctx=8000,
#                         n_gpu_layers=-1,  # to use GPU acceleration
#                         )

# 0.0.43
model = models.llamacpp("bartowski/Meta-Llama-3-8B-Instruct-GGUF",
                        "Meta-Llama-3-8B-Instruct-Q8_0.gguf",
                        tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct"),
                        n_ctx=8000,
                        n_gpu_layers=-1,  # to use GPU acceleration
                        )

class User(BaseModel):
    name: str
    last_name: str
    id: int

class RelevantSummary(BaseModel):
    relevant_summary: str

generator = generate.json(model, RelevantSummary)

result = generator(
"""
<|start_header_id|>system<|end_header_id|>
<|eot_id|>
## OBJECTIVE
1. Write a detailed summary related to Product Announcements.
2. Output your answer in JSON

<|eot_id|>
<|start_header_id|>user<|end_header_id|>

## ARTICLE
VeriSilicon’s 2nd generation automotive ISP series IP passed ISO 26262 ASIL B and ASIL D certifications

Las Vegas, USA, January 8, 2024--VeriSilicon (688521.SH) today announced its Image Signal Processor (ISP) IP ISP8200-FS and ISP8200L-FS, designed for high-performance automotive applications, have been certified compliant with the ISO 26262 automotive functional safety standard, achieving ASIL B certification for random failures and ASIL D certification for systematic failures, respectively. The certifications were granted by ResilTech, a leading safety consultancy company. Building upon the 1st generation of ISO 26262 certified ISP IP, the ISP8200-FS series is updated with advanced ISP technologies and several crucial enhancements for automotive applications after multiple automotive customers’ engagements on the 1st generation version.

ISP8200-FS series automotive ISP IP delivers high pixel throughputs from 1.6Giga to 2Giga pixel per second under different process technologies, supports up to 8 real-time or 16 camera streams from DDR with low latency technology based on multi-camera scheduling mechanism, and supplements the raw pixel processing pipelines for efficient AI processing. In addition, ISP8200-FS has a built-in FLEXA AI interface to capture automotive related ROI objects from AI processor for pedestrians, vehicles, traffic lights and signs detecting and processing.

Since its launch, multiple global major automotive SoC vendors have adopted ISP8200-FS series IP in their products for in-cabin ADAS, the next generation autonomous driving, and unified autonomous driving applications.

“ISP plays a pivotal role in the realm of autonomous driving. To meet the rapidly evolving demands of this industry, VeriSilicon is dedicated to providing our automotive customers with cutting-edge capabilities through our functional safety certified IPs,” said Wei-Jin Dai, Executive VP and GM of IP Division of VeriSilicon. “With adoption by multiple customers worldwide, our certified ISP8200-FS and ISP8200L-FS are specifically designed to cater to both primary application processor and the sensor fusion SoC requirements, including image, radar, and LiDAR capabilities. Minimizing latency from sensing to action is crucial in automotive applications. VeriSilicon offers a comprehensive solution with its Glass-to-Glass intelligent pixel processing functional safety IPs.”

To explore our rich IP portfolios, we invite you to visit VeriSilicon’s booth at the Venetian Expo (Booth No.: Bassano 2701 & Bassano 2702) during the Consumer Electronics Show (CES) 2024, taking place from January 9 to January 12 in Las Vegas.

## SUMMARY
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
, max_tokens=5000)
print(result)

Expected result:

### Results from outlines==0.0.36
relevant_summary="Verisilicon's second generation automotive ISP series IP has passed ISO26262 ASIL B and ASIL D certifications. The ISP8200-FS and ISP8200L-FS IPs are designed for high-performance automotive applications, achieving ASIL B certification for random failures and ASIL D certification for systematic failures respectively. They deliver high pixel throughputs, support multiple camera streams with low latency, and have a built-in FLEXA AI interface. Multiple major automotive SoC vendors have adopted these IPs in their products for in-cabin ADAS, autonomous driving, and unified autonomous driving applications."

Error message:

$ python3 test_outlines.py 
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Compiling FSM index for all state transitions:  76%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                                      | 25/33 [00:03<00:01,  7.24it/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pydantic/main.py", line 1143, in parse_raw
    obj = parse.load_str_bytes(
  File "/usr/local/lib/python3.10/dist-packages/pydantic/deprecated/parse.py", line 49, in load_str_bytes
    return json_loads(b)  # type: ignore
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 22 (char 21)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/test_outlines.py", line 32, in <module>
    result = generator(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/outlines/generate/api.py", line 511, in __call__
    return format(completions)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/outlines/generate/api.py", line 497, in format
    return self.format_sequence(sequences)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/outlines/generate/json.py", line 50, in <lambda>
    generator.format_sequence = lambda x: schema_object.parse_raw(x)
  File "/usr/local/lib/python3.10/dist-packages/pydantic/main.py", line 1170, in parse_raw
    raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
pydantic_core._pydantic_core.ValidationError: 1 validation error for RelevantSummary
__root__
  Invalid control character at: line 1 column 22 (char 21) [type=value_error.jsondecode, input_value='{"relevant_summary":"\nV...ion SoC requirements."}', input_type=str]

Outlines/Python version information:

Version information

``` ubuntu@ip-:~$ python3 -c "from outlines import _version; print(_version.version)". 0.0.43 ubuntu@ip-:~$ python3 -c "import sys; print('Python', sys.version)". Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] pip freeze aiohttp==3.9.5 aiosignal==1.3.1 amqp==5.2.0 annotated-types==0.6.0 anyio==4.3.0 astroid==3.2.2 asttokens==2.4.1 async-timeout==4.0.3 attrs==23.2.0 Automat==22.10.0 awscli==1.32.108 Babel==2.15.0 backports.tarfile==1.1.1 bcrypt==3.2.0 billiard==4.2.0 black==24.4.2 blessed==1.20.0 blinker==1.8.2 boto3==1.34.108 botocore==1.34.108 build==1.2.1 celery==5.4.0 certifi==2024.2.2 cffi==1.16.0 chalice==1.31.0 chardet==4.0.0 charset-normalizer==3.3.2 click==8.1.7 click-didyoumean==0.3.1 click-plugins==1.1.1 click-repl==0.3.0 cloud-init==24.1.3 cloudpickle==3.0.0 cmake==3.29.3 colorama==0.4.6 command-not-found==0.3 configobj==5.0.6 constantly==23.10.4 cryptography==42.0.7 cssselect==1.2.0 dask==2024.5.1 datasets==2.19.1 dbus-python==1.2.18 decorator==5.1.1 defusedxml==0.7.1 devscripts===2.22.1ubuntu1 dill==0.3.8 diskcache==5.6.3 distlib==0.3.8 distro==1.7.0 distro-info==1.1+ubuntu0.2 dnspython==2.6.1 docutils==0.16 dparse==0.6.3 ec2-hibinit-agent==1.0.0 email_validator==2.1.1 exceptiongroup==1.2.1 executing==2.0.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 Flask==3.0.3 frozenlist==1.4.1 fsspec==2024.3.1 gpg==1.16.0 greenlet==3.0.3 h11==0.14.0 hibagent==1.0.1 httpcore==1.0.5 httpie==3.2.2 httplib2==0.22.0 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.2 hyperlink==21.0.0 idna==3.7 importlib_metadata==7.1.0 incremental==22.10.0 iniconfig==2.0.0 inquirer==2.10.1 inquirerpy==0.3.4 interegular==0.3.3 ipython==8.24.0 isort==5.13.2 itemadapter==0.9.0 itemloaders==1.2.0 itsdangerous==2.2.0 jaraco.classes==3.4.0 jaraco.context==5.3.0 jaraco.functools==4.0.1 jedi==0.19.1 jeepney==0.8.0 Jinja2==3.1.4 jmespath==1.0.1 joblib==1.4.2 jsonpatch==1.32 jsonpointer==2.0 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 keyring==25.2.1 kombu==5.3.7 lark==1.1.9 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 llama_cpp_python==0.2.77 llvmlite==0.42.0 lm-format-enforcer==0.10.1 locket==1.0.0 lxml==5.2.2 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib-inline==0.1.7 mccabe==0.7.0 mdurl==0.1.2 more-itertools==10.2.0 mpmath==1.3.0 msgpack==1.0.8 multidict==6.0.5 multiprocess==0.70.16 mypy==1.10.0 mypy-extensions==1.0.0 nest-asyncio==1.6.0 netifaces==0.11.0 networkx==3.3 nh3==0.2.17 ninja==1.11.1.1 numba==0.59.1 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==12.550.52 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.5.40 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.2 olefile==0.46 openai==1.31.1 orjson==3.10.3 outlines==0.0.43 packaging==21.3 pandas==2.2.2 parsel==1.9.1 parso==0.8.4 partd==1.4.2 pathspec==0.12.1 pbr==6.0.0 pexpect==4.9.0 pfzy==0.3.4 pillow==10.3.0 pip-tools==7.4.1 pipdeptree==2.20.0 pipenv==2023.12.1 pkginfo==1.10.0 platformdirs==4.2.2 pluggy==1.5.0 prometheus-fastapi-instrumentator==7.0.0 prometheus_client==0.20.0 prompt-toolkit==3.0.43 Protego==0.3.1 protobuf==5.27.0 psutil==5.9.8 psycopg2-binary==2.9.9 ptyprocess==0.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyairports==2.1.1 pyarrow==16.1.0 pyarrow-hotfix==0.6 pyasn1==0.6.0 pyasn1_modules==0.4.0 pycairo==1.26.0 pycountry==24.6.1 pycparser==2.22 pydantic==2.7.1 pydantic_core==2.18.2 PyDispatcher==2.0.7 Pygments==2.18.0 PyGObject==3.42.1 PyHamcrest==2.0.2 PyJWT==2.8.0 pylint==3.2.1 pyOpenSSL==24.1.0 pyparsing==3.1.2 pyproject_hooks==1.1.0 pyrsistent==0.18.1 pyserial==3.5 PySocks==1.7.1 pytest==8.2.1 python-apt==2.4.0+ubuntu3 python-dateutil==2.9.0.post0 python-debian==0.1.43+ubuntu1.1 python-dotenv==1.0.1 python-editor==1.0.4 python-magic==0.4.24 python-multipart==0.0.9 pytz==2022.1 pyxdg==0.27 PyYAML==6.0.1 queuelib==1.7.0 ray==2.23.0 readchar==4.1.0 readme_renderer==43.0 redis==5.0.4 referencing==0.35.1 regex==2024.5.15 requests==2.31.0 requests-file==2.0.0 requests-toolbelt==1.0.0 rfc3986==2.0.0 rich==13.7.1 roman==3.3 rpds-py==0.18.1 rsa==4.7.2 ruamel.yaml==0.18.6 ruamel.yaml.clib==0.2.8 s3transfer==0.10.1 safetensors==0.4.3 scikit-learn==1.5.0 scipy==1.13.1 Scrapy==2.11.2 SecretStorage==3.3.3 sentence-transformers==3.0.0 sentencepiece==0.2.0 service-identity==24.1.0 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 sos==4.5.6 SQLAlchemy==2.0.30 ssh-import-id==5.11 stack-data==0.6.3 starlette==0.37.2 sympy==1.12.1 systemd-python==234 testresources==2.0.1 threadpoolctl==3.5.0 tiktoken==0.7.0 tldextract==5.1.2 tokenizers==0.19.1 tomli==2.0.1 tomlkit==0.12.5 toolz==0.12.1 torch==2.3.0 tqdm==4.66.4 traitlets==5.14.3 transformers==4.41.2 triton==2.3.0 twine==5.1.0 Twisted==24.3.0 typer==0.12.3 typing_extensions==4.11.0 tzdata==2024.1 ubuntu-pro-client==8001 ufw==0.36.1 ujson==5.10.0 unattended-upgrades==0.1 unidiff==0.5.5 urllib3==2.2.1 uvicorn==0.29.0 uvloop==0.19.0 vine==5.1.0 virtualenv==20.26.2 vllm-flash-attn==2.5.8.post2 w3lib==2.1.2 wadllib==1.3.6 watchfiles==0.21.0 wcwidth==0.2.13 websockets==12.0 Werkzeug==3.0.3 xdg==5 xformers==0.0.26.post1 xxhash==3.4.1 yapf==0.40.2 yarl==1.9.4 zipp==3.18.2 zope.interface==6.4 ```

Context for the issue:

I would like to improve the performance of my summarization and classification pipeline with the newer Llama 3 gguf models. The current performance on the older 0.0.36 outlines library also has some number formatting issues.

No other issue has brought up any problems with Llama3 ggufs, but all of the finetunes I have tried also have the same issue. Either i'm doing something wrong or there is a signification Llama 3 gguf issue that there should be a discussion about. Thank you!

dottxt-ai / outlines