awslabs / gap-text2sql

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
https://arxiv.org/abs/2012.10309
Apache License 2.0
102 stars 25 forks source link

How to execute own queries? #5

Open kev2513 opened 3 years ago

kev2513 commented 3 years ago

Hello I would like to insert my own Questions and Databases but when I try to change the Spider json files it get the error:

RuntimeError: Error(s) in loading state_dict for EncDecModel:
    size mismatch for decoder.rule_logits.2.weight: copying a param with shape torch.Size([97, 128]) from checkpoint, the shape in current model is torch.Size([76, 128]).
    size mismatch for decoder.rule_logits.2.bias: copying a param with shape torch.Size([97]) from checkpoint, the shape in current model is torch.Size([76]).
    size mismatch for decoder.rule_embedding.weight: copying a param with shape torch.Size([97, 128]) from checkpoint, the shape in current model is torch.Size([76, 128]).
    size mismatch for decoder.node_type_embedding.weight: copying a param with shape torch.Size([55, 64]) from checkpoint, the shape in current model is torch.Size([49, 64]).

Is the an elegant solution to test my own data? Thanks in advance!

Impavidity commented 3 years ago

Hey, Thanks for your interests on our work. You can checkout the pull request https://github.com/awslabs/gap-text2sql/pull/6 when it is merged. I think you can run your own database and queries based on the notebook I provided.

Let me know if it works for you and let me know if you have any further questions.

Peng

kev2513 commented 3 years ago

Hello Peng,

Thank you very much for your quick response! I tried the notebook and it worked :+1: I will let you know if I have any questions. Have a nice weekend.

Kevin

kev2513 commented 3 years ago

Hello Peng,

I made further tests and figured out that the response sometimes contains the word 'terminal' for example:

Query: department with budget greater then 10 billion

Answer: SELECT department.Department_ID FROM department WHERE department.Budget_in_Billions > 'terminal'

I guess 'terminal' should be replaced the words contained in the query. How can the replacement be achieved?

Sincerely Kevin

Impavidity commented 3 years ago

Hey Kevin,

Thanks for your question. So the terminal usually will be a cell value: it could be a float/integer or a string. It usually involves some value copy mechanism to do it; but currently the model doesn't support it.

However, there is a simple solution for this: If it is number, you can easily detect the number in the utterance and directly fill it in the generated SQL. For string type, you can match the n-gram against the value in the databases: if it matched, it would be a string value for the corresponding column.

I think above is a simple solution for this. I have a script to achieve this but it takes time to clean it and make it public. You can try this method out because that is pretty simple.

And I will try to make the script public as soon as possible if you did not implement it by yourself.

Peng

kev2513 commented 3 years ago

Hey Peng,

Thank you very much for your explanation. I will try my best :)

Sincerely Kevin

thecodemakr commented 3 years ago

Hi @Impavidity @kev2513 , I get the following error on trying the notebook -

WARNING <class 'seq2struct.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-21-d986dbd802ee> in <module>()
----> 1 inferer = Inferer(infer_config)

4 frames
/content/gap-text2sql/rat-sql-gap/seq2struct/commands/infer.py in __init__(self, config)
     34             registry.lookup('model', config['model']).Preproc,
     35             config['model'])
---> 36         self.model_preproc.load()
     37 
     38     def load_model(self, logdir, step):

/content/gap-text2sql/rat-sql-gap/seq2struct/models/enc_dec.py in load(self)
     54 
     55         def load(self):
---> 56             self.enc_preproc.load()
     57             self.dec_preproc.load()
     58 

/content/gap-text2sql/rat-sql-gap/seq2struct/models/spider/spider_enc.py in load(self)
   1272 
   1273     def load(self):
-> 1274         self.tokenizer = BartTokenizer.from_pretrained(self.data_dir)
   1275 
   1276 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, *inputs, **kwargs)
   1138 
   1139         """
-> 1140         return cls._from_pretrained(*inputs, **kwargs)
   1141 
   1142     @classmethod

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1244                     ", ".join(s3_models),
   1245                     pretrained_model_name_or_path,
-> 1246                     list(cls.vocab_files_names.values()),
   1247                 )
   1248             )

OSError: Model name 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

Can you please help me with what step am I missing?

kev2513 commented 3 years ago

Hello @thecodemakr,

I got the same issue executing Inference solved the problem for me:

python run.py preprocess experiments/spider-configs/gap-run.jsonnet

(also execute the Preprocess dataset step in advance)

alan-ai-learner commented 3 years ago

Can you pls tell me that, How much time this command will run "python run.py preprocess experiments/spider-configs/gap-run.jsonnet", i'm running it for like an hr

roburst2 commented 2 years ago

Hi @Impavidity @kev2513 , I get the following error on trying the notebook -

WARNING <class 'seq2struct.models.enc_dec.EncDecModel.Preproc'>: superfluous {'name': 'EncDec'}
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-21-d986dbd802ee> in <module>()
----> 1 inferer = Inferer(infer_config)

4 frames
/content/gap-text2sql/rat-sql-gap/seq2struct/commands/infer.py in __init__(self, config)
     34             registry.lookup('model', config['model']).Preproc,
     35             config['model'])
---> 36         self.model_preproc.load()
     37 
     38     def load_model(self, logdir, step):

/content/gap-text2sql/rat-sql-gap/seq2struct/models/enc_dec.py in load(self)
     54 
     55         def load(self):
---> 56             self.enc_preproc.load()
     57             self.dec_preproc.load()
     58 

/content/gap-text2sql/rat-sql-gap/seq2struct/models/spider/spider_enc.py in load(self)
   1272 
   1273     def load(self):
-> 1274         self.tokenizer = BartTokenizer.from_pretrained(self.data_dir)
   1275 
   1276 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, *inputs, **kwargs)
   1138 
   1139         """
-> 1140         return cls._from_pretrained(*inputs, **kwargs)
   1141 
   1142     @classmethod

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1244                     ", ".join(s3_models),
   1245                     pretrained_model_name_or_path,
-> 1246                     list(cls.vocab_files_names.values()),
   1247                 )
   1248             )

OSError: Model name 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was not found in tokenizers model name list (facebook/bart-base, facebook/bart-large, facebook/bart-large-mnli, facebook/bart-large-cnn, facebook/bart-large-xsum, yjernite/bart_eli5). We assumed 'data/spider-bart/nl2code-1115,output_from=true,fs=2,emb=bart,cvlink/enc' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

Can you please help me with what step am I missing?

@thecodemakr I am also facing the issue while running the notebook How did you resolve this