Closed darkestfloyd closed 6 years ago
Paper: Learning to Generate Pseudo-code from Source Code using Statistical Machine Translation. The paper describes a SMT based technique to generate pseudo-code from source code. The results of the paper are not very promising though. We could potentially use it to generate more data for training.
Paper: A parallel corpus of Python functions and documentation strings for automated code documentation and code generation Like the previous paper, this paper describes ways to generate docstring from source code.
https://github.com/EdinburghNLP/code-docstring-corpus
Python - Comments opensource dataset
Look for potential data to be used for training.