definitive-io / duckdb-text2sql

42 stars 37 forks source link

DuckDB Query Generator

This repository builds a Streamlit application that allows users to ask questions about their DuckDB data. The application uses the Groq API to generate SQL queries based on the user's questions and execute them on a DuckDB database.

Features

Data

The application queries data from two CSV files located in the data folder:

Prompts

The base prompt for the AI is stored in a text file in the prompts folder:

Functions

Usage

To use this application, you need to have Streamlit and the other required Python libraries installed. You also need to have a Groq API key, which you can obtain by signing up on the Groq website.

Once you have the necessary requirements, you can run the application by executing the script with Streamlit:

streamlit run app.py

This will start the Streamlit server and open the application in your web browser. You can then ask questions about your DuckDB data, and the application will generate and execute SQL queries based on your questions.

Customizing with Your Own Data

This application is designed to be flexible and can be easily customized to work with your own data. If you want to use your own data, follow these steps:

  1. Replace the CSV files: The application queries data from two CSV files located in the data folder: employees.csv and purchases.csv. Replace these files with your own CSV files.

  2. Modify the base prompt: The base prompt for the AI, stored in the prompts folder as base_prompt.txt, contains specific information about the data metadata. Modify this prompt to match the structure and content of your own data. Make sure to accurately describe the tables, columns, and any specific rules or tips for querying your dataset.

By following these steps, you can tailor the DuckDB Query Generator to your own data and use cases. Feel free to experiment and build off this repository to create your own powerful data querying applications.