abhijithneilabraham / tableQA

AI Tool for querying natural language on tabular data.
GNU General Public License v3.0
302 stars 47 forks source link
ai csv database machine-learning nl2sql nlp qa querying-natural-language question-answering sql sql-generation sql-query table-qa tableqa tabular-data

tableQA

AI Tool for querying natural language on tabular data.Built using QA models from transformers.

This work is described in the following paper:
TableQuery: Querying tabular data with natural language, by Abhijith Neil Abraham, Fariz Rahman and Damanpreet Kaur.
If you use TableQA, please cite the paper.

Here is a detailed blog to understand how this works.

A tabular data can be:

Build Status.
Gradient.
Open In Colab

Features

Supported operations.

Configuration:

install via pip:

pip install tableqa

installing from source:

git clone https://github.com/abhijithneilabraham/tableQA

cd tableqa

python setup.py install

Quickstart

Do sample query

from tableqa.agent import Agent
agent=Agent(df) #input your dataframe
response=agent.query_db("Your question here")
print(response)

Get an SQL query from the question

sql=agent.get_query("Your question here")  
print(sql) #returns an sql query

Adding Manual schema

Schema Format:
{
    "name": DATABASE NAME,
    "keywords":[DATABASE KEYWORDS],
    "columns":
    [
        {
        "name": COLUMN 1 NAME,
        "mapping":{
            CATEGORY 1: [CATEGORY 1 KEYWORDS],
            CATEGORY 2: [CATEGORY 2 KEYWORDS]
        }

        },
        {
        "name": COLUMN 2 NAME,
        "keywords": [COLUMN 2 KEYWORDS]
        },
        {
        "name": "COLUMN 3 NAME",
        "keywords": [COLUMN 3 KEYWORDS],
        "summable":"True"
        }
    ]
}

Example (with manual schema):

Database query

* To use PostgreSQL or MySQL on Amazon RDS, you must create a database on Amazon RDS. The RDS must be in public subnet with security groups allowing connections from outside of AWS. 

Refer to step 1 in the [document](https://aws.amazon.com/getting-started/hands-on/create-mysql-db/) to create a mysql db instance on Amazon RDS. Same steps can be followed for creating a PostgreSQL db instance by selecting PostgreSQL in the Engine tab. Obtain the username, password, database, endpoint, and port from your database connection details on Amazon RDS.

from tableqa.agent import Agent agent = Agent(df, schema_file, 'postgres', username='Master username', password='Master password', database='DB name', host='Endpoint', port='Port', aws_db=True) response=agent.query_db("how many people died of stomach cancer in 2011") print(response)

Response =[(22,)]


##### SQL query

sql=agent.get_query("How many people died of stomach cancer in 2011") print(sql)

sql query: SELECT SUM(Death_Count) FROM cancer_death WHERE Cancer_site = "Stomach" AND Year = "2011"


#### Multiple CSVs

* Pass the absolute path of the directories containing the csvs and schemas respectively. Refer [cleaned_data](tableqa/cleaned_data)  and [schema](tableqa/schema) for examples.

##### Example 
* Read CSV and Schema from local machine-

csv_path="/content/tableQA/tableqa/cleaned_data" schema_path="/content/tableQA/tableqa/schema" agent=Agent(csv_path,schema_path)


* Read CSV and schema files from Amazon s3 - 
1) [Create a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) on Amazon s3. 
2) [Upload objects](https://docs.aws.amazon.com/AmazonS3/latest/gsg/PuttingAnObjectInABucket.html) to the bucket.
3) [Create an IAM user](https://www.atensoftware.com/p90.php?q=309) and provide it access to read files from Amazon s3 storage.
4) Obtain the access key and secret access key for the user and pass it as an argument to the agent.

csv_path="s3://{bucket}/cleaned_data" schema_path="s3://{bucket}/schema" agent = Agent(csv_path, schema_path, aws_s3=True, access_key_id=access_key_id, secret_access_key=secret_access_key)



#### Join us

Join our workspace:[Slack](https://join.slack.com/t/newworkspace-ehh1873/shared_invite/zt-hp3i6ic7-exMal1I4ZmFMWaHAwXk8HA)