How to just get the code embedding of entire code_snippet, without having to predict any function name

TejaswiniiB commented 2 years ago

Hi, I want to get code embedding of entire code_snippet. How to get it? As per Code Snippet embedding section in interactive_prediction.ipynb, it gives the embedding of only masked method. I don't have any masked method or func name to predict. I just want the embedding of entire code. How can we do it?

tobias-kirschstein commented 2 years ago

Hi,

thanks for your interest in the Code Transformer. What is your input? If it is not a code snippet containing a single method, our trained models most likely won't give meaningful embeddings. If it is a single method, you can just mask the method name to make it match our training setting. Regarding the embeddings themselves:

encoder_output.all_emb[-1][1] gives you the query stream embedding (1) in the last layer (-1) which should encapsulate the whole snippet.
You can also try using encoder_output.all_emb[-1][1], which gives you the content stream embeddings (0) of the last layer (-1). These are defined per input token and averaging them over the sequence length might give you meaningful embeddings as well.

Hope this helps.

Best, Tobias

TejaswiniiB commented 2 years ago

Hi @tobias-kirschstein , Thank you for your reply. My input is the code containing more than one method (multiple methods). Thankyou!

danielzuegner / code-transformer

How to just get the code embedding of entire code_snippet, without having to predict any function name #25