Closed JohnlNguyen closed 5 years ago
Hi, not sure if this is work in progress, but it'd be nice to see a few more details. I think it'd be good to know:
Sorry, I was still working on the initial comment, would you mind take a second look?
I didn't specify the typical sequence length. @VHellendoorn
The next step I would like to do is to train with the Github docstring-function data which has around 1 million pairs and is already a built-in problem in tensor2tensor.
Interestingly, adding the GitHub doesn't help when boosting the accuracy on the conala test set. This may due to the fact that the GitHub data is not preprocessed the same way as the conala data.
Very interesting, thanks for adding an abundance of details. So it looks to me like (rewritten) intent can be generated pretty well from code. Not so much vice versa, which is quite interesting because the training curve looks healthy. One thing we should maybe look into first is ensuring that it's not an issue with the evaluation and/or data. Could you post some samples of code it generates, e.g. when the training loss gets down around 2? It may also be good to quickly hack together a baseline (2-layer bi-dir RNN + attention should do it); I can share some starter code for that, or I bet there are lots of examples online.
It's also interesting that the Github data didn't help. I wonder if it's because the intents there are a bit more elaborate; perhaps it'd be worth training with just the part up to a DCNL tag (if any). For instance, see this example, where the second half of the description (although useful) is completely different from anything in the Conala dataset.
P.S.: It looks like the challenge distinguishes between training only on annotated data (and only using rewritten_intents from that one) and training on all (using rewritten where available). It might be good to also run a model on just the annotated data; it should be cleaner and much smaller (i.e. good for prototyping), and we know for a fact that we should be able to get over 10 BLEU to be in the race
@VHellendoorn
Actually, There was a problem with the way I tokenized the code. I fixed that problem by using the same tokenizer and preprocessing script from the baseline model. After this, I actually got Test BLEU of .30, (generating code from rewritten intent), as you can see by the Rewritten Intent to Code 6 Layers
section. It significantly outperforms the baseline model.
Vice versa, generating intent from the code I got a validation BLEU around .20 as you can see by Code to Rewritten Intent
section.
Here are some samples
Generated Code
Input: send a signal signal.SIGUSR1 to the current process
Output: def send_signal self signal self signal signal
Ref: os.kill(os.getpid(), signal.SIGUSR1)
Input: decode a hex string '4a4b4c' to UTF-8. Output: def b58decode_chk v return b58decode v Ref: bytes.fromhex('4a4b4c').decode('utf-8')
Input: concatenate a list of strings ['a', 'b', 'c']
Output: def join_string_list s return join map str s
Ref: """""".join(['a', 'b', 'c'])
Interesting, thanks. From the outputs you show, should I gather that the model doesn't work very well, or is there some specific tokenization that is being used to make it look like this?
Was this trained with the full corpus, or rewritten only? Looks like the state-of-the-art is about 35 BLUE on intent -> code (which I believe is the only part evaluated in the challenge), so 20 is a good start, but I bet we can push it further. Are you experiencing any remaining issues with training? Otherwise we can focus on task-specific optimizations.
This was trained with the full corpus.
I don't have any remaining issues with training, and I think we can focus on task-specific optimizations. Would you be free to meet next week to talk about this?
Yeah definitely, only my Monday is full, but Tuesday e.g. after class I'm available
Metadata about the problem
Rewritten Intent to Code 6 Layers
Test BLEU .30
approx_vocab_size = 2**13 ~ 8k
Hyperparameters
Full hyperparameters set
Training Loss
Rewritten Intent to Code 2 Layers
Same hyperparameters as above however num_hidden_layers==2
Code to Rewritten Intent