Question about amount of API calls

Thank you for your interest in this.

Indeed, for OpenAI we have to issue multiple subsequent API calls in the given example. However, we do implement speculative generation, which means, that in some cases you can actually save some of the calls. This is an unfortunate limitation of the OpenAI API and not the case with local models (e.g. via the HuggingFace Transformers backend). With local models, LMQL runs directly as part of the LM's decoding loop, applying masking eagerly and inserting prompt tokens if necessary. So in this case, you only need one decoder call.

To learn more about the limitations of the OpenAI API, you can also find more information at https://docs.lmql.ai/en/latest/language/models.html#openai-api-limitations.

To see how many API calls and tokens LMQL consumes to execute a query, you can also find query statistics and cost estimates when running in the LMQL playground, which reports these statistics in real-time during execution.

When it comes to the benefits of LMQL over plain API use, it depends on what you are looking for. So far, we found LMQL to be of great use when it comes to multi-part prompting schemes (e.g. with dynamic insertions in-between completion), as you don't have to manage multiple API calls yourself, and overall, your code will operate at a higher level. Another thing is that LMQL allows you to run 100s of queries in parallel, and automatically bundles and batches API calls across all executed queries. Constraints are also a big benefit and increase in expressiveness, as they go beyond simple filter lists for specific tokens.

I hope this answers your question.

eth-sri / lmql

Question about amount of API calls #2