bgunlp / qpl

Code and dataset for the paper "Semantic Decomposition of Question and SQL for Text-to-SQL Parsing" EMNLP Findings 2023 - Ben Eyal et al BGU CS NLP Group
https://www.cs.bgu.ac.il/~elhadad/nlpproj/
2 stars 0 forks source link

Support for Multiply and Divide Operators in QPL #10

Open harpoonix opened 6 months ago

harpoonix commented 6 months ago

Hi Ben and Team, First, congratulations on the wonderful work. I was looking through your code and the QPL grammar specified, and could not find support for multiply (*) and divide ( / ) operators in QPL. As of now, to the best of what I have seen, a question like

Natural Language Query: List school names of charter schools with an SAT excellence rate over the average. 

The SQL for computing the inner query, for the average excellence rates of charter schools would be:

SELECT AVG(CAST(T2.`NumGE1500` AS REAL) / T2.`NumTstTakr`)
    FROM frpm AS T1
    INNER JOIN satscores AS T2
    ON T1.`CDSCode` = T2.`cds`
    WHERE T1.`Charter School (Y/N)` = 1
  1. We will need the divide operator in some form to be supported in the QPL for this. Currently, SCAN operations only support one or more fieldNames to be selected in the output, not operations on those fieldnames, for example division of two quantities, (No. of students who got above 1500 SAT score) / (Total Test Takers in that school) in the above example.
  2. Similarly I could not find support for operators in Predicate, which only allows comparison operators for now. This limits the potential for use in queries where there is some predicate on say the product of two fields, eg. price_of_car and orders_placed to get some kind of total expenditure, or some condition on division or addition of two fields.
  3. Do you have a rough idea of what operators can appear in SQL queries, like CAST (also used in our example above to convert into float for division) but are currently not supported by QPL?
  4. Are you currently working on adding support for these operators in the QPL grammar? If not, I would really appreciate some guidance on where to start adding support for such operators, for say multiply, divide etc. How can I go about it, which parts of the code related to QPL parser and validator etc should I modify? Can I also add CAST etc using this approach?

Thank you so much. Your clarifications on these would be much appreciated.

beneyal commented 5 months ago

Hi Harsh,

Sorry for the late reply, I've been busy at work 😅

  1. Technically, we can generate QPL with arithmetic operations. The Scala module that turns an execution plan to "raw" QPL works fine with arithmetic operators, for example, given the query:
SELECT AVG(CAST(Year AS real) / Stadium_ID)
FROM concert_singer.concert;

will return the following raw QPL:

#1 = Scan Table [ concert ] Output [ concert.Year / concert.Stadium_ID ]
#2 = Aggregate [ #1 ] Output [ AVG(concert.Year / concert.Stadium_ID) ]

The thing is that we didn't intend to support this, so the post-processing done on this QPL will fail (it's all regular expression work there). The code that does the post-processing, i.e., working on QPL that comes out of Scala, is here.

  1. Same here as well. You won't have to touch the Scala module, as it already gives something to work with. For example, this query:
SELECT Year
FROM concert_singer.concert
WHERE Year / Stadium_ID >= 100;

will return this raw QPL:

#1 = Scan Table [ concert ] Predicate [ concert.Year / concert.Stadium_ID >= 100 ] Output [ concert.Year , concert.Year , concert.Stadium_ID ]

It will be a matter of tinkering with the post-processor to handle these cases.

  1. I don't know. It'll be a matter of trial and error with different queries. By the way, CAST is somewhat supported; you don't see it in the QPL because, in Spider, it was meaningless. A CAST is translated to a Convert node in the XML execution plan from SQL Server, and we treat this node as a "pass-through" in the Scala code. I can see places where this might be handy (for example, the first query I gave above will divide two integers and return an integer unless cast), but it would also make a mess of most QPL, as implicit casts happen a lot, and they all create a Convert node.

  2. I don't personally work on QPL anymore, but I don't mind scheduling a Zoom meeting with you to show you around the code and which part of the pipeline does what. My email is bene@post.bgu.ac.il.