RUCKBReasoning / RESDSQL

The Pytorch implementation of RESDSQL (AAAI 2023).
https://arxiv.org/abs/2302.05965
MIT License
238 stars 59 forks source link

Obtaining query_toks_no_value from query #31

Closed Vishaal-Kareti closed 1 year ago

Vishaal-Kareti commented 1 year ago

I'm attempting to try training on another dataset by appending the dataset's train.json, dev.json, and tables.json to Spider's (and adding the database into Spider's too) with RESDSQL. I'm a bit stumped on how to generate the query_toks_no_value from a query. Is there a script for this, or do you have any advice on how to make one?

lihaoyang-ruc commented 1 year ago

query_toks_no_value is an original field in the spider's dataset, but we did not use it in our project. Here is the first example in the spider's training set:

{
        "db_id": "department_management",
        "query": "SELECT count(*) FROM head WHERE age  >  56",
        "query_toks": [
            "SELECT",
            "count",
            "(",
            "*",
            ")",
            "FROM",
            "head",
            "WHERE",
            "age",
            ">",
            "56"
        ],
        "query_toks_no_value": [
            "select",
            "count",
            "(",
            "*",
            ")",
            "from",
            "head",
            "where",
            "age",
            ">",
            "value"
        ],
        "question": "How many heads of the departments are older than 56 ?",
        "question_toks": [
            "How",
            "many",
            "heads",
            "of",
            "the",
            "departments",
            "are",
            "older",
            "than",
            "56",
            "?"
        ],
        "sql": {
            "from": {
                "table_units": [
                    [
                        "table_unit",
                        1
                    ]
                ],
                "conds": []
            },
            "select": [
                false,
                [
                    [
                        3,
                        [
                            0,
                            [
                                0,
                                0,
                                false
                            ],
                            null
                        ]
                    ]
                ]
            ],
            "where": [
                [
                    false,
                    3,
                    [
                        0,
                        [
                            0,
                            10,
                            false
                        ],
                        null
                    ],
                    56.0,
                    null
                ]
            ],
            "groupBy": [],
            "having": [],
            "orderBy": [],
            "limit": null,
            "intersect": null,
            "union": null,
            "except": null
        }
    }

Our code only uses db_id, query, and question fields. Therefore, you don't need to think about how to generate query_toks_no_value.

Vishaal-Kareti commented 1 year ago

Oh, got it. Thank you for the clarification!