Question on implementation

TAG-Research / TAG-Bench

TAG-Bench: A benchmark for table-augmented generation (TAG)

MIT License

575 stars 59 forks source link

Thanks for sharing the paper and code!

I have a question regarding the technical implementation and the differences between Text2SQL and TAG. From the schematic in your paper below, you outline TAG with this three-step process:

Screenshot 2024-10-28 at 08 37 01

From what I can see here, this looks to me like a standard Text2SQL function calling pattern that one would implement for an OpenAI-like assistant with function (tool) calling. That is, you provide a Text2SQL function (tool) to the LLM that takes in the user's query and uses a separate LLM call (with information on the database schema provided in the system prompt) to generate an SQL query. The query is then run and the response is returned as a tool message and passed back to the orchestrator LLM. Running the orchestrator in an iterative function calling loop allows the assistant to apply additional semantic reasoning and execute multiple Text2SQL calls if needed. Once the complete response has been obtained, the LLM can provide a text-based response to the user.

Whenever I have implemented such a pattern for an LLM assistant I have referred to this process as simply Text2SQL - would this, however, fit your definition of TAG? Perhaps the distinction is more important for academic benchmarking cases, but I would appreciate your input here. Thanks!

I had similar questions when reading this paper. My guess is that there's a couple of things that are supposed to make TAG different:

It seems to me that according to the paper, standard Text2SQL doesn't involve passing the result back to a LM to generate an answer. Instead, you just execute the queries and show the result tables to the user.

"While Text2SQL omits the final generation step and stops short after query execution, ..." (p.3)

The use of language model based semantic operators with APIs like LOTUS, that allow you to, say, filter for movies that are 'classic' based on world knowledge.

The approach I have heard of without something like LOTUS is to construct embeddings of semantic columns such as a description or brand name and do a similarity search to get the most appropriate column value to filter on. Semantic operators seem more generalizable, if costly, but I do wonder if we take out LOTUS itself how much of differentiation is lost as compared to Text2SQL + LM generation on the results.

I think TAG is also meant to be a more general model that can be applied to various types of databases/query engines beyond SQL. "The underlying system used to store the data can use many possible database execution engines..." (p.3)

Would love to hear what the authors have to say about this!

TAG-Research / TAG-Bench

Question on implementation #4