Closed hellovai closed 1 month ago
For more context the proposed solution is something like:
In the example:
All matches:
car
x2
car-1
x1
Reduced to:
car
x1
car-1
x1
Tie breaker fails to disambiguate.
added a branch with a unit test: https://github.com/BoundaryML/baml/pull/1088
Anyone is free to work on it from here:
cd $REPO/engine/baml-lib/jsonish/src
RUST_LOG=trace cargo test test_numerical_enum
When the LLM returns something more "approximate" our parsing algorithm can handle substring'ed aliases better.
For example:
Raw LLM response.
We will currently parse this as
Foo.A
cause technically we find two instances of "car" and one of "car-2".Fix: https://github.com/BoundaryML/baml/blob/cd6b141020ec8dfd2514c82ffffaebc5678a025b/engine/baml-lib/jsonish/src/deserializer/coercer/match_string.rs
Change
string_match_strategy
to account for substrings that are counted multiple times to only favor the longest possible one.