This repository releases a text-to-SQL demo, powered by CodeS, a language model specifically tailored for text-to-SQL translation.
It is important to note that CodeS is designed as a single-turn text-to-SQL model and is not intended for multi-turn conversations. Consequently, it cannot understand contexts in the chat box. Should the model's responses not meet your expectations, it is advisable to rephrase your question rather than trying to steer the model toward a correct answer with follow-up prompts.
Our development environments are configured as follows:
(If Java is already installed, feel free to skip this step.)
Execute the following commands in your terminal:
apt-get update
apt-get install -y openjdk-11-jdk
Run these commands to set up your virtual environment:
conda create -n demo python=3.8.5
conda activate demo
Ensure you have all necessary packages by running:
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
Now your environment should be all set up and ready for deployment!
Download the the file sic_ckpts.zip for the schema item classifier. Then, unzip the downloaded file in the root directory of the project:
unzip sic_ckpts.zip
By default, this project includes only one database (i.e., singer
) in the databases
folder.
To access all databases available in our online demo:
To add and use your own databases:
databases
directory../data/tables.json
file with the necessary information about your database, including:
db_id
: The name of your database (e.g., my_db
for a database file located at databases/my_db/my_db.sqlite
).table_names_original
and column_names_original
: The original names of tables and columns in your database.table_names
and column_names
: The semantic names (or comments) for the tables and columns in your database.To enhance the efficiency of content-based queries on the databases, run the following command to build the BM25 index:
python -u build_contents_index.py
Please note that this process might take a considerable amount of time, depending on the size and content of the databases. Your patience is appreciated during this step.
Upon completing these steps, your project should be fully configured.
To initiate the website, execute the following command:
python -u app.py
This action will start the web application, making it accessible at http://your_ip:5000/chatbot
. Please note that the user's history questions will be logged and can be accessed in the data/history/history.sqlite
file.
Given that our model is predominantly trained on English text, integrating a translation API becomes essential for handling user's questions in languages other than English.
In this project, we have utilized Baidu Translate. To facilitate multilingual support, please configure your Baidu Translate API token within the app.py
script. You can follow these guidelines to create your Baidu Translate application and acquire the necessary API access token: Baidu Translate API Guide and Baidu Translate API Reference.
It is important to note that the translation quality may influence the model's accuracy. For enhanced text-to-SQL performance in languages of your preference, consider opting for more robust translation engines like Google Translate or DeepL.
For any questions about this project, feel free to open a Github issue or directly contact Haoyang Li via email at lihaoyang.cs@ruc.edu.cn.
Our gratitude extends to the teams behind ChatBot💬 WebApp in Python using Flask, BIRD, and Spider for their outstanding contributions to the field.