VHellendoorn / Code-LMs

Guide to using pre-trained large language models of source code
MIT License
1.76k stars 245 forks source link

Plan on Releasing Generated Sample C Code #11

Closed Lightninghkm closed 2 years ago

Lightninghkm commented 2 years ago

Hi there, nice job on this work! I'm wondering are you planning on releasing some generated sample C code? Just be curious about what they look like. For judging the functional correctness of the generated Python code, I know you used HumanEval to evaluate that, did you conduct a similar functionality check on C code also? Thanks much!

VHellendoorn commented 2 years ago

Hi, sorry for the slow reply. I can see the merit of releasing some samples of unconditionally generated code in various languages; I'll work on this in the next few days. How do you envision checking functional correctness on C code? As far as I know, there is no such benchmark for C. I suppose one could translate HumanEval, though that would be a nontrivial effort.

Lightninghkm commented 2 years ago

Hi, thanks much for your reply and no problem at all!

Yes, I totally agree with you on checking the functional correctness of automatically generated C code would be hard, and there is no existing work on that front. My assumption is that we probably still need some sort of manual effort for checking the functionality correctness. But in terms of grammar correctness, I guess we can always rely on the compiler to do its job :->

Another reason I want to see some generated C code is that, since I'm doing the program analysis on C/C++ code for security, I'm curious about whether the C code generation might introduce potential security flaw or not. I know that this is not the focus of this work and it is already really great research, just want to see if there is an opportunity for enhancing the security of the generated C code.

Looking forward to the release soon, thanks again!