Open longxudou opened 2 years ago
Hi @DreamerDeo, In order to understand what is going on, I need the SQL schema and the ground-truth SQL for these databases and questions. Thanks!
Edit: it looks like your first example is from wta_1
. Is that correct?
Edit2: how many incomplete sql predictions are we talking about? how many is "many"?
I know what the problem is.
in the examples you posted, there is a conflict between the table_id
and the column_id
parsers. for instance, orchestra
can refer to either the orchestra
table or the orchestra
column in the orchestra
table. the column_id
has precedence over the table_id
. therefore, the continuation with .[column_id]
is rejected.
Thanks for your quick reply!
A1: The db_id of the above three examples are wta_1
, singer
, orchestra
respectively.
A2: For T5-large, 61 incomplete sql in dev (1034 in total). I am using the query_toks_no_value
of spider
. All these incomplete sql are listed in the following with its database_id
, question
and gold sql
.
And when I am using the same input/output format like yours, there are still some incomplete sql.
Some cases could be explained by conflict between the table_id and the column_id parsers
mentioned in your replay, while the others appears that it's hard to generate the <
token.
And when I am using the same input/output format like yours, there are still some incomplete sql.
Thanks, can you provide a continuations that the model predicted but that where rejected for these examples?
A2: For T5-large, 61 incomplete sql in dev (1034 in total).
Thank you, this is very useful. I can turn these into test cases.
And when I am using the same input/output format like yours, there are still some incomplete sql.
Thanks, can you provide a continuations that the model predicted but that where rejected for these examples?
Yes, I would love to do this job that but what do you mean by continuations
exactly here?
(1) the debug log like that in my first reply?
(2) the cases that successfully predicted without PICARD but got incomplete prediction by PICARD?
More like (1), but just for the last step. I want to see which token proposals were rejected when production ended. Thanks!
Since the log file is a little long, I paste the log in Google doc. You can edit it as you want. If you need further information, please let me know ASAP.
Thanks! :)
I took care of the issues with the parser. I believe that what remains are problems with the model(s) or the data.
@tscholak Thanks!
But I find that the latest docker images docker pull tscholak/text-to-sql-eval:c4c9a08965cfa01a4c0773a8f67687b33409836f
or docker pull tscholak/text-to-sql-eval:cache
would throw out the error:
File "/opt/conda/lib/python3.7/site-packages/transformers/trainer_seq2seq.py", line 177, in prediction_step
**gen_kwargs,
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context│
return func(*args, **kwargs)
File "/app/seq2seq/utils/picard_model_wrapper.py", line 163, in _generate
logits_processor.append(
AttributeError: 'Tensor' object has no attribute 'append'
Makefile:139: recipe for target 'eval' failed
make: *** [eval] Error 1
If roll back to docker pull tscholak/text-to-sql-eval:5ff827fa65c719ff975a37bd1d6940214731f3f5
the problem would resolve.
You can reproduce this by directly running make eval
@DreamerDeo I pushed a fix. Please check :)
@tscholak It works now. But prediction becomes much slower, is that normal?
t5-base
with batch-size = 20
on docker cache
.
I did not observe a slowdown, but I also didn't complete a full evaluation. Please let me know if you observe a regression in accuracy.
@tscholak FYI, in the first 10% eval data, it's okay (shows that the eval time will be ~4 mins as before). But after that, the prediction will be stuck and displays that remains one hours to eval. The eval is not finished right now so I can't tell you the accuracy, but the incomplete SQL problem is solved right now :)
Before I try to reproduce, can you confirm please that you are seeing the stalling for the original spider eval set and not your altered one with added table names?
@tscholak It works now by clean the docker.
Thank you very much for your help in this issue :) You really help me a lot on this!
I reopened the issue. We need to find out which examples take a lot of time to generate.
It works now by clean the docker.
Glad to hear it, can you confirm though that it is consistently good now?
@tscholak I observe that (1) after reboot the server and cleaning the docker cache, it will predict in the normal prediction speed under your setting (2) as for my output format (TABLE.COLUMN), the accuracy is largely improved but there are still some incomplete cases. I think that's because the model is trained with Deepspeed (the lr scheduler and optimizer is different), if the model is trained use your script, it works well). And the prediction speed is still very slow for this cases ((TABLE.COLUMN)). I attempt to predict the dev set in 10 pieces to find which examples take a lot of time to generate.
PS: T5-large tscholak/1wnr382e
will achieve 71.08 exact match with the new PICARD. You can check this and update the README.
And if I want to change the haskell code to build my own picard server. (I want to do debug job to push my project more fast.) What's the workflow here?
Parse.hs
make build-eval-image
make eval
using the docker image built in step 2Is that correct?
I attempt to predict the dev set in 10 pieces to find which examples take a lot of time to generate.
Thank you, this will help a lot.
T5-large
tscholak/1wnr382e
will achieve 71.08 exact match with the new PICARD
Interesting, and a bit surprising. Will try to reproduce.
if I want to change the haskell code [...] what's the workflow here?
You can do that, or you can use VS Code and start a dev container. You can then make changes to both the Haskell and the Python code, recompile Picard, run the tests, and even run evaluation. The Haskell code is built with cabal
.
Appreciate for this interesting work!
I trained a new T5 model from scratch using your script and predicted with PICARD but encounter a problem.
Modification: replacing the
COLUMN
withTABLE.COLUMN
in SQL as follows. Problem: generating many incompleted SQL with PICARD (T5-large), which could be correctly generated without PICARD.Here are some examples.
I try to improve the
picard_max_tokens_to_check
from2
to3
then the above SQL have been generated correctly.However, many incompleted SQL still exists, even if the
num_beams=8
andpicard_max_tokens_to_check=6
. Like this example:Here are the debug log by printing the
input_ids
. https://github.com/ElementAI/picard/blob/5ff827fa65c719ff975a37bd1d6940214731f3f5/seq2seq/utils/picard_model_wrapper.py#L369CLICK ME
> bid: 0 ---Could you give me some suggestion for this? Thanks in advance!