About processed data - Githubissues

BeachWang / DAIL-SQL

A efficient and effective few-shot NL2SQL method on GPT-4.

Apache License 2.0

427 stars 69 forks source link

About processed data #34

Closed Z-Diviner closed 3 weeks ago

Z-Diviner commented 5 months ago

Hello, it's an honor to read your paper, which has inspired me deeply. When downloading your preprocessed data, I found the following four files, representing examples of how they were processed. Can you help me explain them?

Z-Diviner commented 5 months ago

And could you provide some information about the data processed on the Bird dataset？

BeachWang commented 5 months ago

Hi, thank you for your interest in our paper.

The EUCDISQUESTIONMASK implies that the application of selector with only masked question similarity. The EUCDISMASKPRESKLSIMTHR considers the both similarities of masked question and the skeleton of pre-predicted SQL. The QA says we represent examples with questions and queries, and without database schemas. The 150 is the limit of output length of LLM, while 10000 and 4096 are the limit of total length.