Open tomsquest opened 3 months ago
Seeing the json as highlighted by GitHub, it seems the json is invalid (field unterminated: qui cher"rating": 4,
).
Can it be the problem?
Yep you nailed it! This is a really interesting edge case! It seems like the LLM dropped teh quotation mark AND a ,
after
qui cher"rating
We have hueristics for how we infer mistakes in JSON, but i think we can do a bit smarter algorithsm to address this kind of issue.
(e.g. do a look ahead for all the known keys and see if we can find something else)
That said, we may also consider adding LLM based error correction for really weird edge cases like this one).
What happend here is that we inferred the end of the first string incorrectly. We've noted that this happens more when using non-english esp with structured data (as theres just less training data for structured non-english).
Will add this to the unit test suite we have and see what we can do about it! (Seems very tractable tho)
Excellent, thanks for the feedback !
@tomsquest this took a bit to update, but it looks like this is a bit more complicated than it looks like on first glance. We'll likely have to update the parser to simulatenously explore looking for multiple keys as the content was just lost due bad string terminations. Any heuristic we apply here will likely lead to other cases being wrong.
The only approach that would likely work is:
That said, i think the current bandwidth is a bit too full to support doing this work atm. How often is this happening in your use case?
Hi @hellovai ,
Thanks for your answer.
Tricky "json" parsing indeed!
I don't know if one could take some idea from project like json_repair (Not Rust, but for the idea).
That said, i think the current bandwidth is a bit too full to support doing this work atm
No urgency on my side, problem solved by switching to Claude Haiku.
How often is this happening in your use case?
It happened once using Gemini (I think it was the pro-001 model). Of course changing the prompt rerun the dices.
Oh FYI I wrote an article about how I use BAML to solve my problem (convert my list of books from text to json, thansk to Baml). Here it is: https://www.tomsquest.com/blog/2024/08/get-structured-output-from-llm-using-baml/
Of course changing the prompt rerun the dices.
FYI: you can set temperature=0
on pretty much all models to maximize their determinism (most of them default to temperature=1
which means that there's randomness baked into the response).
We don't currently provide a standardized way to do this, but setting options.temperature
to 0
(example) will work for most models, since we forward temperature
to the model provider.
Hi @sxlijin ,
Thanks for the tips, it's what I did for my "final" version (clients.baml). I don't remember if I encountered the malformed json prior to setting the temperature to 0 or not.
Hi,
I am starting to play with Baml, and I am very excited about it.
Problem
I stumble upon a strange case, where a test in PromptFilddle is failing, but the LLM output seems correct. Indeed, the test does not pass with the reason of a missing field, but the LLM seemed to have a correct output.
Detail
Here is the
Raw llm Response
as printed in PromptFiddle:And the test fails with:
Failed to coerce value: Error parsing '<root>': Missing required field: slug
Screenshot
Link to PromptFiddle
https://www.promptfiddle.com/extract-book-info-ystjv