Mindinventory / MindSQL

MindSQL: A Python Text-to-SQL RAG Library simplifying database interactions. Seamlessly integrates with PostgreSQL, MySQL, SQLite, Snowflake, and BigQuery. Powered by GPT-4 and Llama 2, it enables natural language queries. Supports ChromaDB and Faiss for context-aware responses.
https://www.mindinventory.com/text-to-sql-mindsql.php?utm_source=sampromotiion&utm_medium=Button&utm_campaign=sampromotion&utm_id=sampromotion&utm_term=sampromotiion&utm_content=sampromotiion
GNU General Public License v3.0
191 stars 16 forks source link

How do I design my example.json? #15

Closed kuazhangxiaoai closed 1 month ago

kuazhangxiaoai commented 1 month ago

MindSQL is a great work! But I have a few questions about example.jsom containing SQLQuery and question. Could you tell me why we need the json file and how to design the json file? Thanks for your answers!

ishika-mi commented 1 month ago

@kuazhangxiaoai

Why We Need the example.json File

The example.json file serves several important purposes in the context of a Retrieval-Augmented Generation (RAG) system like MindSQL:

  1. Few-Shot Learning: By providing examples of questions and their corresponding SQL queries, we enable the large language model (LLM) to learn from these examples and generate accurate SQL queries for new, unseen questions. This technique, known as few-shot learning, helps the model understand the structure and context of the queries.

  2. Consistency: The file ensures consistency in the types of queries generated. By standardizing the format and structure of the questions and SQL queries, we can achieve more reliable and predictable outputs from the LLM.

  3. Customization: It allows users to customize the system according to their specific database schema and typical queries. By tailoring the examples to the actual database, the model's performance and accuracy are significantly improved.

  4. Debugging and Testing: Having a collection of example queries makes it easier to debug and test the system. If the generated SQL queries are incorrect, the examples can help identify where the model is going wrong and guide adjustments.

How to Design the example.json File

Designing the example.json file involves several steps:

  1. Identify Common Queries: Determine the most common or critical queries that users are likely to ask about the database. These should cover a wide range of operations such as data retrieval, aggregation, and filtering.

  2. Map Questions to SQL Queries: For each identified query, create a natural language question that a user might ask and map it to the corresponding SQL query. Ensure that the SQL queries are correct and efficient.

  3. Maintain Consistency: Ensure that the format of the questions and SQL queries is consistent throughout the file. This includes consistent use of capitalization, punctuation, and SQL syntax.

Here’s a simple step-by-step approach to creating an example.json file:

[
  {
    "Question": "Retrieve all categories from the Category table.",
    "SQLQuery": "SELECT * FROM Category;"
  },
  {
    "Question": "Get the names of all suppliers.",
    "SQLQuery": "SELECT companyname FROM Supplier;"
  },
  {
    "Question": "Retrieve the first 10 rows from the Employee table.",
    "SQLQuery": "SELECT * FROM Employee LIMIT 10;"
  },
  // Add more examples as needed
]