Closed liususan091219 closed 4 months ago
Dear authors, could you please release the code for prompting GPT-4? If I'm not wrong, it looks like you have only released the code for fine tuning. Thanks in advance.
Thank you for your attention. I have added the code for prompting GPT-4 and also included a CSV file containing the vulnerable lines and dependency lines. For the results from GPT-4, please refer to the previously released filtered multi-task dataset, which is located at the path dataset/MixVul/multi_task/
.
Note: Due to the randomness introduced by GPT-4’s sampling techniques, its interpretations of vulnerabilities may not be exactly the same each time. Therefore, the results in our multi-task dataset may be difficult to fully reproduce.
HI Xiaohu, thanks a lot for your quick response and for sharing the code!
I just hope to ask a clarification question on the "Vulnerability Lines" for GPT-4. In the paper:
We extract vulnerability lines from the patches of the vulnerable code. Patches are generated to fix existing vulnerabilities via adding or deleting certain code elements
The input of VulLLM is a code (not a patch), right? Since some of the datasets in Table 1 only contains the code not the patch. How is the patch generated here?
Thanks in advance for your help!
HI Xiaohu, thanks a lot for your quick response and for sharing the code!
I just hope to ask a clarification question on the "Vulnerability Lines" for GPT-4. In the paper:
We extract vulnerability lines from the patches of the vulnerable code. Patches are generated to fix existing vulnerabilities via adding or deleting certain code elements
The input of VulLLM is a code (not a patch), right? Since some of the datasets in Table 1 only contains the code not the patch. How is the patch generated here?
Thanks in advance for your help!
Yes, the input of VulLLM does not include the patch. The patch is used for generating data for auxiliary tasks. The patch is a field included in the PatchDB dataset, which you can view at this link: https://huggingface.co/datasets/sunlab/patch_db. Please note that we only use PatchDB to generate data for auxiliary tasks. Since the pre-fix code does not include the added lines, we only consider the deleted lines in the patch as the vulnerable lines. Based on these vulnerable lines, we use JOERN to extract dependency lines and then prompt GPT-4.
HI Xiaohu, thanks a lot for your quick response and for sharing the code! I just hope to ask a clarification question on the "Vulnerability Lines" for GPT-4. In the paper:
We extract vulnerability lines from the patches of the vulnerable code. Patches are generated to fix existing vulnerabilities via adding or deleting certain code elements
The input of VulLLM is a code (not a patch), right? Since some of the datasets in Table 1 only contains the code not the patch. How is the patch generated here? Thanks in advance for your help!Yes, the input of VulLLM does not include the patch. The patch is used for generating data for auxiliary tasks. The patch is a field included in the PatchDB dataset, which you can view at this link: https://huggingface.co/datasets/sunlab/patch_db. Please note that we only use PatchDB to generate data for auxiliary tasks. Since the pre-fix code does not include the added lines, we only consider the deleted lines in the patch as the vulnerable lines. Based on these vulnerable lines, we use JOERN to extract dependency lines and then prompt GPT-4.
Thanks for your response. So you used PatchDB data to instruction fine tune an LLM, and what is the instruction fine tuned LLM used for? Since the input code doesn't contain the +- lines, I guess is the instruction fine-tuned LLM used for generating the +- lines, so it becomes a patch so you can start using joern?
I think if the patchDB instruction fine tuning code can be uploaded, it'll help explain more clearly. Thanks!
HI Xiaohu, thanks a lot for your quick response and for sharing the code! I just hope to ask a clarification question on the "Vulnerability Lines" for GPT-4. In the paper:
We extract vulnerability lines from the patches of the vulnerable code. Patches are generated to fix existing vulnerabilities via adding or deleting certain code elements
The input of VulLLM is a code (not a patch), right? Since some of the datasets in Table 1 only contains the code not the patch. How is the patch generated here? Thanks in advance for your help!Yes, the input of VulLLM does not include the patch. The patch is used for generating data for auxiliary tasks. The patch is a field included in the PatchDB dataset, which you can view at this link: https://huggingface.co/datasets/sunlab/patch_db. Please note that we only use PatchDB to generate data for auxiliary tasks. Since the pre-fix code does not include the added lines, we only consider the deleted lines in the patch as the vulnerable lines. Based on these vulnerable lines, we use JOERN to extract dependency lines and then prompt GPT-4.
Thanks for your response. So you used PatchDB data to instruction fine tune an LLM, and what is the instruction fine tuned LLM used for? Since the input code doesn't contain the +- lines, I guess is the instruction fine-tuned LLM used for generating the +- lines, so it becomes a patch so you can start using joern?
I think if the patchDB instruction fine tuning code can be uploaded, it'll help explain more clearly. Thanks!
I don't quite understand what you mean. The instruction fine tuned LLM is VulLLM, which is used to detect vulnerabilities.
Let me outline the entire process again. First, we use the PatchDB dataset to obtain vulnerable lines (based on patches) and dependency lines (based on source code, vulnerable lines, and JOERN). Then, we use GPT-4 to obtain vulnerability interpretations. Finally, we use three tasks to fine-tune open-source CodeLLMs. The format of the instruction fine-tuning dataset can refer to the released datasets or Appendix A.
Let's say it's a function on GitHub that does not have any patch yet (e.g., an example in the Devign dataset). How do you use the PatchDB dataset to obtain the patch? I know PatchDB has the patch. But in Devign, the patch has not happened yet since it's 0 day, right?
Let's say it's a function on GitHub that does not have any patch yet (e.g., an example in the Devign dataset). How do you use the PatchDB dataset to obtain the patch? I know PatchDB has the patch. But in Devign, the patch has not happened yet since it's 0 day, right?
Yes, other datasets do not contain ready-made patch information, and obtaining patches corresponding to vulnerabilities is also challenging. Therefore, we only perform vulnerability interpretation on PatchDB.
Thanks for your quick response.
For the Devign result in Table 1, do you use the framework in Figure 1&2 or not? It seems your framework relies on the patch, but if we already know the patch, it's not 0 day detection anymore. Therefore, we can't apply Figure 1&2 to Devign.
Could you help outline the process for Devign?
Thanks for your quick response.
For the Devign result in Table 1, do you use the framework in Figure 1&2 or not? It seems your framework relies on the patch, but if we already know the patch, it's not 0 day detection anymore. Therefore, we can't apply Figure 1&2 to Devign.
Could you help outline the process for Devign?
In our experiment, Devign is only used for the vulnerability detection task shown in Figure 1 (as described in Section 3.4). We apologize for not extracting auxiliary task data from Devign, so there are no related process.
I think I got it now: your paper is about collecting the training data to instruction fine tune the VulLLM. Then you use the VulLLM to do a second fine tune on the Devign data. It doesn't do the interpretation at inference time.
Thank you so much for your patience and explanations!
Dear authors, could you please release the code for prompting GPT-4? If I'm not wrong, it looks like you have only released the code for fine tuning. Thanks in advance.