ServiceNow / picard

PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. PICARD is a ServiceNow Research project that was started at Element AI.
https://arxiv.org/abs/2109.05093
Apache License 2.0
342 stars 122 forks source link

Incomplete SQL prediction with PICARD #67

Open longxudou opened 2 years ago

longxudou commented 2 years ago

Appreciate for this interesting work!

I trained a new T5 model from scratch using your script and predicted with PICARD but encounter a problem.

Modification: replacing the COLUMN with TABLE.COLUMN in SQL as follows. Problem: generating many incompleted SQL with PICARD (T5-large), which could be correctly generated without PICARD.

Here are some examples.

question: what is the name of the winner who has won the most matches, and how many rank points does this player have? "gold": "select matches.winner_name, matches.winner_rank_points from matches group by matches.winner_name order by count ( * ) desc limit 1" "pred": "select"

`question: what are the citizenships that are shared by singers with a birth year before 1945 and after 1955? "gold": "select singer.citizenship from singer where singer.birth_year < 1945 intersect select singer.citizenship from singer where singer.birth_year > 1955" "pred": "select singer.citizenship from singer where singer.birth_year"

I try to improve the picard_max_tokens_to_check from 2 to 3 then the above SQL have been generated correctly.

However, many incompleted SQL still exists, even if the num_beams=8 and picard_max_tokens_to_check=6. Like this example:

question: what are the record companies of orchestras in descending order of years in which they were founded? "gold": "select orchestra.record_company from orchestra order by orchestra.year_of_founded desc" "pred": "select orchestra"

Here are the debug log by printing the input_ids. https://github.com/ElementAI/picard/blob/5ff827fa65c719ff975a37bd1d6940214731f3f5/seq2seq/utils/picard_model_wrapper.py#L369

CLICK ME > bid: 0 --- > bid: 1 --- > bid: 2 --- > bid: 3 --- > ======= step: 1 ======= > bid: 0 --- orchestra > bid: 1 --- Orchestra > bid: 2 --- orchestra > bid: 3 --- orchestra > ======= step: 2 ======= > bid: 0 --- orchestra | > bid: 1 --- orchestra| > bid: 2 --- Orchestra | > bid: 3 --- Orchestra| > ======= step: 3 ======= > bid: 0 --- orchestra | select > bid: 1 --- orchestra| select > bid: 2 --- Orchestra | select > bid: 3 --- orchestra | Select > ======= step: 4 ======= > bid: 0 --- orchestra | select orchestra > bid: 1 --- orchestra| select orchestra > bid: 2 --- orchestra | select Orchestra > bid: 3 --- Orchestra | select orchestra > ======= step: 5 ======= > bid: 0 --- orchestra | select orchestra, > bid: 1 --- orchestra | select orchestra > bid: 2 --- orchestra| select orchestra, > bid: 3 --- orchestra| select orchestra > ======= step: 6 ======= > bid: 0 --- orchestra | select orchestra, orchestra > bid: 1 --- orchestra | select orchestra, > bid: 2 --- orchestra | select orchestra - > bid: 3 --- orchestra| select orchestra, orchestra > ======= step: 7 ======= > bid: 0 --- orchestra | select orchestra, orchestra > bid: 1 --- orchestra | select orchestra - orchestra > bid: 2 --- orchestra | select orchestra, orchestra, > bid: 3 --- orchestra | select orchestra, orchestra > ======= step: 8 ======= > bid: 0 --- orchestra | select orchestra, orchestra, orchestra > bid: 1 --- orchestra | select orchestra, orchestra, > bid: 2 --- orchestra | select orchestra, orchestra, > bid: 3 --- orchestra | select orchestra, orchestra > ======= step: 9 ======= > bid: 0 --- orchestra | select orchestra, orchestra, orchestra > bid: 1 --- orchestra | select orchestra, orchestra, orchestra > bid: 2 --- orchestra | select orchestra, orchestra, orchestra, > bid: 3 --- orchestra | select orchestra, orchestra, orchestra

Could you give me some suggestion for this? Thanks in advance!

tscholak commented 2 years ago

Hi @DreamerDeo, In order to understand what is going on, I need the SQL schema and the ground-truth SQL for these databases and questions. Thanks!

Edit: it looks like your first example is from wta_1. Is that correct?

Edit2: how many incomplete sql predictions are we talking about? how many is "many"?

tscholak commented 2 years ago

I know what the problem is. in the examples you posted, there is a conflict between the table_id and the column_id parsers. for instance, orchestra can refer to either the orchestra table or the orchestra column in the orchestra table. the column_id has precedence over the table_id. therefore, the continuation with .[column_id] is rejected.

longxudou commented 2 years ago

Thanks for your quick reply!

A1: The db_id of the above three examples are wta_1, singer, orchestra respectively.

A2: For T5-large, 61 incomplete sql in dev (1034 in total). I am using the query_toks_no_value of spider. All these incomplete sql are listed in the following with its database_id, question and gold sql.

CLICK ME > dbid: cre_Doc_Template_Mgt > ques: show all template type codes with less than three templates. > gold: select templates.template_type_code from templates group by templates.template_type_code having count ( * ) < 'value' > pred: select templates.template_type_code from templates group by templates.template_type_code having count ( * ) > > dbid: cre_Doc_Template_Mgt > ques: what are the codes of template types that have fewer than 3 templates? > gold: select templates.template_type_code from templates group by templates.template_type_code having count ( * ) < 'value' > pred: select templates.template_type_code from templates group by templates.template_type_code having count ( * ) > > dbid: dog_kennels > ques: which owner has paid the largest amount of money in total for their dogs? show the owner id and zip code. > gold: select owners.owner_id, owners.zip_code from owners join dogs on owners.owner_id = dogs.owner_id join treatments on dogs.dog_id = treatments.dog_id group by owners.owner_id order by sum ( treatments.cost_of_treatment ) desc limit 1 > pred: select owners.owner_id, owners.zip_code from owners join dogs on owners.owner_id = dogs.owner_id group by owners.owner_id order by sum ( dogs.cost_of_treatment > > dbid: dog_kennels > ques: find the owner id and zip code of the owner who spent the most money in total for his or her dogs. > gold: select owners.owner_id, owners.zip_code from owners join dogs on owners.owner_id = dogs.owner_id join treatments on dogs.dog_id = treatments.dog_id group by owners.owner_id order by sum ( treatments.cost_of_treatment ) desc limit 1 > pred: select owners.owner_id, owners.zip_code from owners join dogs on owners.owner_id = dogs.owner_id group by owners.owner_id order by sum ( dogs.cost_of_treatment > > dbid: dog_kennels > ques: what are the first name and last name of the professionals who have done treatment with cost below average? > gold: select distinct professionals.first_name, professionals.last_name from professionals join treatments where treatments.cost_of_treatment < ( select avg ( treatments.cost_of_treatment ) from treatments ) > pred: select professionals.first_name, professionals.last_name from professionals join treatments on professionals.professional_id = treatments.professional_id where treatments.cost_of_treatment - ( select avg ( treatments.cost_of_treatment ) from treatments )!= 'value' > > dbid: dog_kennels > ques: which professionals have operated a treatment that costs less than the average? give me theor first names and last names. > gold: select distinct professionals.first_name, professionals.last_name from professionals join treatments where treatments.cost_of_treatment < ( select avg ( treatments.cost_of_treatment ) from treatments ) > pred: select professionals.first_name, professionals.last_name from treatments join professionals on treatments.professional_id = professionals.professional_id where treatments.cost_of_treatment - ( select avg ( treatments.cost_of_treatment ) from treatments )!= 'value' > > dbid: dog_kennels > ques: how many dogs have an age below the average? > gold: select count ( * ) from dogs where dogs.age < ( select avg ( dogs.age ) from dogs ) > pred: select count ( * ) from dogs where dogs.age - ( select avg ( dogs.age ) from dogs )!= ( select avg ( dogs.age ) from dogs ) > > dbid: dog_kennels > ques: count the number of dogs of an age below the average. > gold: select count ( * ) from dogs where dogs.age < ( select avg ( dogs.age ) from dogs ) > pred: select count ( * ) from dogs where dogs.age - ( select avg ( dogs.age ) from dogs )!= ( select avg ( dogs.age ) from dogs ) > > dbid: singer > ques: show the citizenship shared by singers with birth year before 1945 and after 1955. > gold: select singer.citizenship from singer where singer.birth_year < 'value' intersect select singer.citizenship from singer where singer.birth_year > 'value' > pred: select singer.citizenship from singer where singer.birth_year > > dbid: singer > ques: what are the citizenships that are shared by singers with a birth year before 1945 and after 1955? > gold: select singer.citizenship from singer where singer.birth_year < 'value' intersect select singer.citizenship from singer where singer.birth_year > 'value' > pred: select singer.citizenship from singer where singer.birth_year > > dbid: network_1 > ques: show ids of all students who do not have any friends. > gold: select highschooler.id from highschooler except select friend.student_id from friend > pred: select student > > dbid: poker_player > ques: what is the maximum number of final tables made among poker players with earnings less than 200000? > gold: select max ( poker_player.final_table_made ) from poker_player where poker_player.earnings < 'value' > pred: select max ( poker_player.final_table_made ) from poker_player where poker_player.earnings > > dbid: poker_player > ques: return the maximum final tables made across all poker players who have earnings below 200000. > gold: select max ( poker_player.final_table_made ) from poker_player where poker_player.earnings < 'value' > pred: select max ( poker_player.final_table_made ) from poker_player where poker_player.earnings > > dbid: museum_visit > ques: how many visitors below age 30 are there? > gold: select count ( * ) from visitor where visitor.age < 'value' > pred: select count ( * ), visitor.age from visitor where visitor.age - 30!= 'value' > > dbid: museum_visit > ques: find the average number of staff working for the museums that were open before 2009. > gold: select avg ( museum.num_of_staff ) from museum where museum.open_year < 'value' > pred: select avg ( museum.num_of_staff ) from museum where museum.open_year > > dbid: museum_visit > ques: what is the name of the visitor who visited both a museum opened before 2009 and a museum opened after 2011? > gold: select visitor.name from visitor join visit on visitor.id = visit.visitor_id join museum on museum.museum_id = visit.museum_id where museum.open_year < 'value' intersect select visitor.name from visitor join visit on visitor.id = visit.visitor_id join museum on museum.museum_id = visit.museum_id where museum.open_year > 'value' > pred: select visitor.name from visit join museum on visit.museum_id = museum.museum_id join visitor on visit.visitor_id = visitor > > dbid: museum_visit > ques: how many museums were opened after 2013 or before 2008? > gold: select count ( * ) from museum where museum.open_year > 'value' or museum.open_year < 'value' > pred: select count ( * ) from museum where museum.open_year > 'value' or museum.open_year > > dbid: course_teach > ques: what is the name of each teacher and what course they teach? > gold: select teacher.name, course.course from course_arrange join course on course_arrange.course_id = course.course_id join teacher on course_arrange.teacher_id = teacher.teacher_id > pred: select teacher.name, course > > dbid: wta_1 > ques: what are the names of players who won in both 2013 and 2016? > gold: select matches.winner_name from matches where matches.year = 'value' intersect select matches.winner_name from matches where matches.year = 'value' > pred: select players.winner_name from matches join players > > dbid: wta_1 > ques: what is the name of the winner who has won the most matches, and how many rank points does this player have? > gold: select matches.winner_name, matches.winner_rank_points from matches group by matches.winner_name order by count ( * ) desc limit 1 > pred: select > > dbid: wta_1 > ques: find the name and rank of the 3 youngest winners across all matches. > gold: select distinct matches.winner_name, matches.winner_rank from matches order by matches.winner_age limit 3 > pred: select players.winner_name, matches.winner_rank from players > > dbid: wta_1 > ques: what are the names and ranks of the three youngest winners across all matches? > gold: select distinct matches.winner_name, matches.winner_rank from matches order by matches.winner_age limit 3 > pred: select players.winner_name, matches.winner_rank from players > > dbid: orchestra > ques: what are the record companies of orchestras in descending order of years in which they were founded? > gold: select orchestra.record_company from orchestra order by orchestra.year_of_founded desc > pred: select orchestra > > dbid: orchestra > ques: return the record companies of orchestras, sorted descending by the years in which they were founded. > gold: select orchestra.record_company from orchestra order by orchestra.year_of_founded desc > pred: select orchestra > > dbid: orchestra > ques: what are the names of conductors as well as the corresonding orchestras that they have conducted? > gold: select conductor.name, orchestra.orchestra from conductor join orchestra on conductor.conductor_id = orchestra.conductor_id > pred: select conductor.name, orchestra > > dbid: orchestra > ques: list the record company shared by the most number of orchestras. > gold: select orchestra.record_company from orchestra group by orchestra.record_company order by count ( * ) desc limit 1 > pred: select orchestra > > dbid: orchestra > ques: what is the record company used by the greatest number of orchestras? > gold: select orchestra.record_company from orchestra group by orchestra.record_company order by count ( * ) desc limit 1 > pred: select orchestra > > dbid: orchestra > ques: list the names of orchestras that have no performance. > gold: select orchestra.orchestra from orchestra where orchestra.orchestra_id not in ( select performance.orchestra_id from performance ) > pred: select orchestra > > dbid: orchestra > ques: show the record companies shared by orchestras founded before 2003 and after 2003. > gold: select orchestra.record_company from orchestra where orchestra.year_of_founded < 'value' intersect select orchestra.record_company from orchestra where orchestra.year_of_founded > 'value' > pred: select orchestra > > dbid: orchestra > ques: what are the record companies that are used by both orchestras founded before 2003 and those founded after 2003? > gold: select orchestra.record_company from orchestra where orchestra.year_of_founded < 'value' intersect select orchestra.record_company from orchestra where orchestra.year_of_founded > 'value' > pred: select orchestra > > dbid: orchestra > ques: find the number of orchestras whose record format is cd '' or dvd ''. > gold: select count ( * ) from orchestra where orchestra.major_record_format = 'value' or orchestra.major_record_format = 'value' > pred: select count ( * ) from orchestra where orchestra > > dbid: orchestra > ques: count the number of orchestras that have cd or dvd as their record format. > gold: select count ( * ) from orchestra where orchestra.major_record_format = 'value' or orchestra.major_record_format = 'value' > pred: select count ( * ) from orchestra where orchestra > > dbid: orchestra > ques: show the years in which orchestras that have given more than one performance are founded. > gold: select orchestra.year_of_founded from orchestra join performance on orchestra.orchestra_id = performance.orchestra_id group by performance.orchestra_id having count ( * ) > 'value' > pred: select orchestra > > dbid: orchestra > ques: what are years of founding for orchestras that have had more than a single performance? > gold: select orchestra.year_of_founded from orchestra join performance on orchestra.orchestra_id = performance.orchestra_id group by performance.orchestra_id having count ( * ) > 'value' > pred: select orchestra > > dbid: flight_2 > ques: find all airlines that have fewer than 200 flights. > gold: select airlines.airline from airlines join flights on airlines.uid = flights.airline group by airlines.airline having count ( * ) < 'value' > pred: select airlines.airline from airlines join flights on airlines.uid = flights.airline group by airlines.airline having count ( * ) > > dbid: flight_2 > ques: which airlines have less than 200 flights? > gold: select airlines.airline from airlines join flights on airlines.uid = flights.airline group by airlines.airline having count ( * ) < 'value' > pred: select airlines.airline from airlines join flights on airlines.uid = flights.airline group by airlines.airline having count ( * ) > > dbid: concert_singer > ques: show countries where a singer above age 40 and a singer below 30 are from. > gold: select singer.country from singer where singer.age > 'value' intersect select singer.country from singer where singer.age < 'value' > pred: select singer.country from singer where singer.age > 'value' intersect select singer.country from singer where singer.age > > dbid: voter_1 > ques: what is the area code in which the most voters voted? > gold: select area_code_state.area_code from area_code_state join votes on area_code_state.state = votes.state group by area_code_state.area_code order by count ( * ) desc limit 1 > pred: select area_code_state.area_code from area_code_state join votes on area_code_state > > dbid: car_1 > ques: find the model of the car whose weight is below the average weight. > gold: select car_names.model from car_names join cars_data on car_names.makeid = cars_data.id where cars_data.weight < ( select avg ( cars_data.weight ) from cars_data ) > pred: select model_list.model from cars_data join car_names on cars_data.id = car_names > > dbid: car_1 > ques: what is the model for the car with a weight smaller than the average? > gold: select car_names.model from car_names join cars_data on car_names.makeid = cars_data.id where cars_data.weight < ( select avg ( cars_data.weight ) from cars_data ) > pred: select car_names.model from cars_data join car_names on cars_data.id = car_names.makeid where cars_data > > dbid: car_1 > ques: find the name of the makers that produced some cars in the year of 1970? > gold: select distinct car_makers.maker from car_makers join model_list on car_makers.id = model_list.maker join car_names on model_list.model = car_names.model join cars_data on car_names.makeid = cars_data.id where cars_data.year = 'value' > pred: select car_makers.maker from car_makers where cars_data.year = 'value > > dbid: car_1 > ques: what is the maker of the carr produced in the earliest year and what year was it? > gold: select car_names.make, cars_data.year from cars_data join car_names on cars_data.id = car_names.makeid where cars_data.year = ( select min ( cars_data.year ) from cars_data ) > pred: select car_makers.maker, cars_data.year from cars_data join car_names on cars_data.id = car_names > > dbid: car_1 > ques: what are the different models for the cards produced after 1980? > gold: select distinct model_list.model from model_list join car_names on model_list.model = car_names.model join cars_data on car_names.makeid = cars_data.id where cars_data.year > 'value' > pred: select distinct model_list.model from model_list join car_names on model_list.modelid = car_names.makeid where cars_data.year > 'value' and car_names.model > > dbid: car_1 > ques: which distinctive models are produced by maker with the full name general motors or weighing more than 3500? > gold: select distinct model_list.model from car_names join model_list on car_names.model = model_list.model join car_makers on model_list.maker = car_makers.id join cars_data on car_names.makeid = cars_data.id where car_makers.fullname = 'value' or cars_data.weight > 'value' > pred: select distinct car_names.model from car_names join model_list on car_names.make = model_list.maker join car > > dbid: car_1 > ques: what are the different years in which there were cars produced that weighed less than 4000 and also cars that weighted more than 3000? > gold: select distinct cars_data.year from cars_data where cars_data.weight between 'value' and 'value' > pred: select count ( * ), cars_data.year from cars_data where cars_data > > dbid: car_1 > ques: what is the maximum miles per gallon of the car with 8 cylinders or produced before 1980? > gold: select max ( cars_data.mpg ) from cars_data where cars_data.cylinders = 'value' or cars_data.year < 'value' > pred: select max ( cars_data.mpg ) from cars_data where cars_data.cylinders = 'value' or cars_data.year > > dbid: car_1 > ques: what is the maximum mpg of the cars that had 8 cylinders or that were produced before 1980? > gold: select max ( cars_data.mpg ) from cars_data where cars_data.cylinders = 'value' or cars_data.year < 'value' > pred: select max ( cars_data.mpg ) from cars_data where cars_data.cylinders = 'value' or cars_data.year > > dbid: car_1 > ques: what are the id and names of the countries which have more than 3 car makers or produce the 'fiat'model? > gold: select countries.countryid, countries.countryname from countries join car_makers on countries.countryid = car_makers.country group by countries.countryid having count ( * ) > 'value' union select countries.countryid, countries.countryname from countries join car_makers on countries.countryid = car_makers.country join model_list on car_makers.id = model_list.maker where model_list.model = 'value' > pred: select countries.countryid, countries.countryname from car_makers join countries on car_makers.country = countries.countryid group by countries.countryid having count ( * ) > 'value' union select countries.countryid, countries.countryname from car_makers join countries on car_makers.id = countries.countryid where car_makers.maker = 'value' or car_names.model = 'value’. if countries.model_names.model_list.model > > dbid: car_1 > ques: what are the ids and names of all countries that either have more than 3 car makers or produce fiat model? > gold: select countries.countryid, countries.countryname from countries join car_makers on countries.countryid = car_makers.country group by countries.countryid having count ( * ) > 'value' union select countries.countryid, countries.countryname from countries join car_makers on countries.countryid = car_makers.country join model_list on car_makers.id = model_list.maker where model_list.model = 'value' > pred: select countries.countryid, countries.countryname from car_makers join countries on car_makers.country = countries.countryid group by countries.countryid having count ( * ) > 'value' union select countries.countryid, countries.countryname from car_makers join countries on car_makers.id = countries.countryid where car_makers.maker = 'value' or car_names.model = 'value’. > > dbid: world_1 > ques: which language is the most popular in aruba? > gold: select countrylanguage.language from country join countrylanguage on country.code = countrylanguage.countrycode where country.name = 'value' order by countrylanguage.percentage desc limit 1 > pred: select countrylanguage.language from city join country on city.id = country.country > > dbid: world_1 > ques: what is average life expectancy in the countries where english is not the official language? > gold: select avg ( country.lifeexpectancy ) from country where country.name not in ( select country.name from country join countrylanguage on country.code = countrylanguage.countrycode where countrylanguage.language = 'value' and countrylanguage.isofficial = 'value' ) > pred: select avg ( country.lifeexpectancy ) from country where countrylanguage.language!= 'value' and country.name not in ( select country.name from country > > dbid: world_1 > ques: give the mean life expectancy of countries in which english is not the official language. > gold: select avg ( country.lifeexpectancy ) from country where country.name not in ( select country.name from country join countrylanguage on country.code = countrylanguage.countrycode where countrylanguage.language = 'value' and countrylanguage.isofficial = 'value' ) > pred: select avg ( country.lifeexpectancy ) from country where countrylanguage.language!= 'value' and country.name not in ( select country.name from country > > dbid: world_1 > ques: which african countries have a smaller population than that of any country in asia? > gold: select country.name from country where country.continent = 'value' and country.population < ( select min ( country.population ) from country where country.continent = 'value' ) > pred: select country.name from country where country.continent = 'value' and country > > dbid: world_1 > ques: which cities are in european countries where english is not the official language? > gold: select distinct city.name from country join city on city.countrycode = country.code where country.continent = 'value' and country.name not in ( select country.name from country join countrylanguage on country.code = countrylanguage.countrycode where countrylanguage.isofficial = 'value' and countrylanguage.language = 'value' ) > pred: select city.name from city join country on city.countrycode = country.code where countrylanguage.language!= 'value' and country.name in ( select country.name from country where country.language > > dbid: world_1 > ques: find the average life expectancy and total population for each continent where the average life expectancy is shorter than 72? > gold: select sum ( country.population ), avg ( country.lifeexpectancy ), country.continent from country group by country.continent having avg ( country.lifeexpectancy ) < 'value' > pred: select avg ( country.lifeexpectancy ), sum ( country.population ), country.cont > > dbid: world_1 > ques: what are the different continents and the total popuation and average life expectancy corresponding to each, for continents that have an average life expectancy less than 72? > gold: select sum ( country.population ), avg ( country.lifeexpectancy ), country.continent from country group by country.continent having avg ( country.lifeexpectancy ) < 'value' > pred: select country.continent, sum ( country.population ), avg ( country.lifeexpectancy ) from country group by country.continent having avg ( country.lifeexpectancy ) > > dbid: student_transcripts_tracking > ques: what are the names and id of courses having at most 2 sections? > gold: select courses.course_name, courses.course_id from courses join sections on courses.course_id = sections.course_id group by courses.course_id having count ( * ) <= 'value' > pred: select courses.course_name, sections.course_id from courses join sections on courses.course_id = sections.course_id group by sections.course_id having count ( * ) > > dbid: student_transcripts_tracking > ques: what are the names and ids of every course with less than 2 sections? > gold: select courses.course_name, courses.course_id from courses join sections on courses.course_id = sections.course_id group by courses.course_id having count ( * ) <= 'value' > pred: select courses.course_name, sections.course_id from courses join sections on courses.course_id = sections.course_id group by sections.course_id having count ( * ) > > dbid: student_transcripts_tracking > ques: find the last name of the students who currently live in the state of north carolina but have not registered in any degree program. > gold: select students.last_name from students join addresses on students.current_address_id = addresses.address_id where addresses.state_province_county = 'value' except select distinct students.last_name from students join student_enrolment on students.student_id = student_enrolment.student_id > pred: select students.last_name from students join student_enrolment on students.student_id = student_enrolment.student_id join addresses on addresses.address_id = student_enrolment.student_id where addresses.state_province_county = 'value' except select students.last_name from students join student_enrolment on students.student_id = student_enrolment.student_id join addresses on student_enrol > > dbid: battle_death > ques: what is maximum and minimum death toll caused each time? > gold: select max ( death.killed ), min ( death.killed ) from death > pred: select max ( death.to > > dbid: employee_hire_evaluation > ques: which district has both stores with less than 3000 products and stores with more than 10000 products? > gold: select shop.district from shop where shop.number_products < 'value' intersect select shop.district from shop where shop.number_products > 'value' > pred: select shop.district from shop where shop.number_product
longxudou commented 2 years ago

And when I am using the same input/output format like yours, there are still some incomplete sql.

CLICK ME > show all template type codes with less than three templates. | cre_Doc_Template_Mgt > "gold": "select template_type_code from templates group by template_type_code having count(*) < 3", > "pred": "select template_type_code from templates group by template_type_code having count(*)", > > what are the codes of template types that have fewer than 3 templates? | cre_Doc_Template_Mgt > "gold": "select template_type_code from templates group by template_type_code having count(*) < 3", > "pred": "select template_type_code from templates group by template_type_code having count(*)", > > which first names are used for professionals or owners but are not used as dog names? | dog_kennels > "gold": "select first_name from professionals union select first_name from owners except select name from dogs", > "pred": "select first_name from professionals or", > > find the first names that are used for professionals or owners but are not used as dog names. | dog_kennels > "gold": "select first_name from professionals union select first_name from owners except select name from dogs", > "pred": "select first_name from professionals", > > which owner has paid the largest amount of money in total for their dogs? show the owner id and zip code. | dog_kennels > "gold": "select t1.owner_id, t1.zip_code from owners as t1 join dogs as t2 on t1.owner_id = t2.owner_id join treatments as t3 on t2.dog_id = t3.dog_id group by t1.owner_id order by sum(t3.cost_of_treatment) desc limit 1", > "pred": "select t1.owner_id, t1.zip_code from owners as t1 join dogs as t2 on t1.owner_id = t2.owner_id group by t1.owner_id order by sum(t2.cost_of_treatment", > > find the owner id and zip code of the owner who spent the most money in total for his or her dogs. | dog_kennels > "gold": "select t1.owner_id, t1.zip_code from owners as t1 join dogs as t2 on t1.owner_id = t2.owner_id join treatments as t3 on t2.dog_id = t3.dog_id group by t1.owner_id order by sum(t3.cost_of_treatment) desc limit 1", > "pred": "select t1.owner_id, t1.zip_code from owners as t1 join dogs as t2 on t1.owner_id = t2.owner_id group by t1.owner_id order by sum(t2.cost_of_treatment", > > show the citizenship shared by singers with birth year before 1945 and after 1955. | singer > "gold": "select citizenship from singer where birth_year < 1945 intersect select citizenship from singer where birth_year > 1955", > "pred": "select citizenship from singer where birth_year", > > what are the citizenships that are shared by singers with a birth year before 1945 and after 1955? | singer > "gold": "select citizenship from singer where birth_year < 1945 intersect select citizenship from singer where birth_year > 1955", > "pred": "select citizenship from singer where birth_year", > > what is the maximum number of final tables made among poker players with earnings less than 200000? | poker_player > "gold": "select max(final_table_made) from poker_player where earnings < 200000", > "pred": "select max(final_table_made) from poker_player where money_rank", > > > return the maximum final tables made across all poker players who have earnings below 200000. | poker_player > "gold": "select max(final_table_made) from poker_player where earnings < 200000", > "pred": "select max(final_table_made) from poker_player where money_rank", > > what is the name of the visitor who visited both a museum opened before 2009 and a museum opened after 2011? | museum_visit > "gold": "select t1.name from visitor as t1 join visit as t2 on t1.id = t2.visitor_id join museum as t3 on t3.museum_id = t2.museum_id where t3.open_year < 2009 intersect select t1.name from visitor as t1 join visit as t2 on t1.id = t2.visitor_id join museum as t3 on t3.museum_id = t2.museum_id where t3.open_year > 2011", > "pred": "select t2.name from visit as t1 join visitor as t2 on t1.visitor_id = t2.id join museum as t3 on t1.museum_id = t3.museum_id where t3.open_year", > > how many museums were opened after 2013 or before 2008? | museum_visit > "gold": "select count(*) from museum where open_year > 2013 or open_year < 2008", > "pred": "select count(*) from museum where open_year > 2013 or open_year", > > what are the first names of the students who live in haiti permanently or have the cell phone number 09700166582? | student_transcripts_tracking > "gold": "select t1.first_name from students as t1 join addresses as t2 on t1.permanent_address_id = t2.address_id where t2.country = 'haiti' or t1.cell_mobile_number = '09700166582'", > "pred": "select t1.first_name from students as t1 join addresses as t2 on t1.permanent_address_id = t2.address_",

Some cases could be explained by conflict between the table_id and the column_id parsers mentioned in your replay, while the others appears that it's hard to generate the < token.

tscholak commented 2 years ago

And when I am using the same input/output format like yours, there are still some incomplete sql.

Thanks, can you provide a continuations that the model predicted but that where rejected for these examples?

tscholak commented 2 years ago

A2: For T5-large, 61 incomplete sql in dev (1034 in total).

Thank you, this is very useful. I can turn these into test cases.

longxudou commented 2 years ago

And when I am using the same input/output format like yours, there are still some incomplete sql.

Thanks, can you provide a continuations that the model predicted but that where rejected for these examples?

Yes, I would love to do this job that but what do you mean by continuations exactly here? (1) the debug log like that in my first reply? (2) the cases that successfully predicted without PICARD but got incomplete prediction by PICARD?

tscholak commented 2 years ago

More like (1), but just for the last step. I want to see which token proposals were rejected when production ended. Thanks!

longxudou commented 2 years ago

Since the log file is a little long, I paste the log in Google doc. You can edit it as you want. If you need further information, please let me know ASAP.

Thanks! :)

tscholak commented 2 years ago

I took care of the issues with the parser. I believe that what remains are problems with the model(s) or the data.

longxudou commented 2 years ago

@tscholak Thanks!

But I find that the latest docker images docker pull tscholak/text-to-sql-eval:c4c9a08965cfa01a4c0773a8f67687b33409836f or docker pull tscholak/text-to-sql-eval:cache would throw out the error:

  File "/opt/conda/lib/python3.7/site-packages/transformers/trainer_seq2seq.py", line 177, in prediction_step                                                                                                     
    **gen_kwargs,                                                                                        
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context│
    return func(*args, **kwargs)                                                                         
  File "/app/seq2seq/utils/picard_model_wrapper.py", line 163, in _generate                              
    logits_processor.append(                                                                             
AttributeError: 'Tensor' object has no attribute 'append'
Makefile:139: recipe for target 'eval' failed
make: *** [eval] Error 1

If roll back to docker pull tscholak/text-to-sql-eval:5ff827fa65c719ff975a37bd1d6940214731f3f5 the problem would resolve. You can reproduce this by directly running make eval

tscholak commented 2 years ago

@DreamerDeo I pushed a fix. Please check :)

longxudou commented 2 years ago

@tscholak It works now. But prediction becomes much slower, is that normal?

t5-base with batch-size = 20 on docker cache.

image
tscholak commented 2 years ago

I did not observe a slowdown, but I also didn't complete a full evaluation. Please let me know if you observe a regression in accuracy.

longxudou commented 2 years ago

@tscholak FYI, in the first 10% eval data, it's okay (shows that the eval time will be ~4 mins as before). But after that, the prediction will be stuck and displays that remains one hours to eval. The eval is not finished right now so I can't tell you the accuracy, but the incomplete SQL problem is solved right now :)

CLICK ME image image image image For comparsion, the former version of `t5-base` with PICARD is about 4~6 minutes. Here is the `eval.json`, `model` and `batch-size` have been changed for fast evaluation. The other files are kept the same after `git clone git@github.com:ElementAI/picard.git`. I think you could reproduce this problem easily, please wait a moment during evaluation. > { > "run_name": "t5+picard-spider-eval", > "model_name_or_path": "tscholak/1zha5ono", > "dataset": "spider", > "source_prefix": "", > "schema_serialization_type": "peteshaw", > "schema_serialization_randomized": false, > "schema_serialization_with_db_id": true, > "schema_serialization_with_db_content": true, > "normalize_query": true, > "target_with_db_id": true, > "output_dir": "/eval", > "cache_dir": "/transformers_cache", > "do_train": false, > "do_eval": true, > "fp16": false, > "per_device_eval_batch_size": 20, > "seed": 1, > "report_to": ["none"], > "predict_with_generate": true, > "num_beams": 4, > "num_beam_groups": 1, > "diversity_penalty": 0.0, > "max_val_samples": 1034, > "use_picard": true, > "launch_picard": true, > "picard_mode": "parse_with_guards", > "picard_schedule": "incremental", > "picard_max_tokens_to_check": 2, > "eval_accumulation_steps": 1, > "metric_config": "both", > "val_max_target_length": 512, > "val_max_time": 1200 > }
tscholak commented 2 years ago

Before I try to reproduce, can you confirm please that you are seeing the stalling for the original spider eval set and not your altered one with added table names?

longxudou commented 2 years ago

@tscholak It works now by clean the docker.

Thank you very much for your help in this issue :) You really help me a lot on this!

tscholak commented 2 years ago

I reopened the issue. We need to find out which examples take a lot of time to generate.

tscholak commented 2 years ago

It works now by clean the docker.

Glad to hear it, can you confirm though that it is consistently good now?

longxudou commented 2 years ago

@tscholak I observe that (1) after reboot the server and cleaning the docker cache, it will predict in the normal prediction speed under your setting (2) as for my output format (TABLE.COLUMN), the accuracy is largely improved but there are still some incomplete cases. I think that's because the model is trained with Deepspeed (the lr scheduler and optimizer is different), if the model is trained use your script, it works well). And the prediction speed is still very slow for this cases ((TABLE.COLUMN)). I attempt to predict the dev set in 10 pieces to find which examples take a lot of time to generate.

PS: T5-large tscholak/1wnr382e will achieve 71.08 exact match with the new PICARD. You can check this and update the README.

longxudou commented 2 years ago

And if I want to change the haskell code to build my own picard server. (I want to do debug job to push my project more fast.) What's the workflow here?

  1. Modify Parse.hs
  2. make build-eval-image
  3. make eval using the docker image built in step 2

Is that correct?

tscholak commented 2 years ago

I attempt to predict the dev set in 10 pieces to find which examples take a lot of time to generate.

Thank you, this will help a lot.

T5-large tscholak/1wnr382e will achieve 71.08 exact match with the new PICARD

Interesting, and a bit surprising. Will try to reproduce.

if I want to change the haskell code [...] what's the workflow here?

You can do that, or you can use VS Code and start a dev container. You can then make changes to both the Haskell and the Python code, recompile Picard, run the tests, and even run evaluation. The Haskell code is built with cabal.