Open Anurag-Swarnim-Yadav opened 7 months ago
It is done on 650,499 unique function pairs, see section 4.2.1 of the paper.
Thank you so much, Dr. Monperrus, for the clarification. Could you please ask the author to update the GitHub so we can retrain VRepair? At present, there are no instructions or commands to follow.
@chenzimin what's the script used for training?
@monperrus Dr. Monperrus could you please guide me to processed pre-trained dataset?
Hi @Anurag-Swarnim-Yadav
$ curl -LO "https://github.com/ASSERT-KTH/VRepair/releases/download/v20240223/BugFix.tar.bz2"
$ tar xvjf BugFix.tar.bz2
$ wc ./only_first_line_context3_more_parameters_models/data/BugFix_train_src.txt
# there are 534858 C functions in this file
$ here is the first C function of the dataset
$ head -1 ./only_first_line_context3_more_parameters_models/data/BugFix_train_src.txt
CWE-000 static int alloc_long_term_buff ( struct ibmvnic_adapter * adapter , struct ibmvnic_long_term_buff * ltb , int size ) { struct device * dev = & adapter -> vdev -> dev ; ltb -> size = size ; ltb -> buff = dma_alloc_coherent ( dev , ltb -> size , & ltb -> addr , GFP_KERNEL ) ; if ( ! ltb -> buff ) { dev_err ( dev , "Couldn\'t<S2SV_blank>alloc<S2SV_blank>long<S2SV_blank>term<S2SV_blank>buffer\\n" ) ; return - ENOMEM ; } ltb -> map_id = adapter -> map_id ; adapter -> map_id ++ ; init_completion ( & adapter -> fw_done ) ; send_request_map ( adapter , ltb -> addr , ltb -> size , ltb -> map_id ) ; wait_for_completion ( & adapter -> fw_done ) ; <S2SV_StartBug> return 0 ; <S2SV_EndBug> }
@monperrus Hi Dr. Monperrus. Thank you so much for being helpful. Thank you again.
Is it true that pre-training is only done on 23,607 C/C++ functions?