Whenever the input data is larger the program gets stuck

faysalhossain2007 commented 1 year ago

Hi,

Whenever I use 400+ data to train, test and validate the model, it gets stuck and shows the following in the log:

Dataset 39 to cpg.
/opt/devign-master/data/cpg/0_cpg.bin
Compiling (synthetic)/ammonite/predef/interpBridge.sc
Compiling (synthetic)/ammonite/predef/replBridge.sc
Compiling (synthetic)/ammonite/predef/sourceBridge.sc
Compiling (synthetic)/ammonite/predef/frontEndBridge.sc
Compiling (synthetic)/ammonite/predef/DefaultPredef.sc
Compiling /opt/devign-master/(console)

     ██╗ ██████╗ ███████╗██████╗ ███╗   ██╗

     ██║██╔═══██╗██╔════╝██╔══██╗████╗  ██║

/opt/devign-master/data/cpg/1_cpg.bin
     ██║██║   ██║█████╗  ██████╔╝██╔██╗ ██║

██   ██║██║   ██║██╔══╝  ██╔══██╗██║╚██╗██║

╚█████╔╝╚██████╔╝███████╗██║  ██║██║ ╚████║

/opt/devign-master/data/cpg/2_cpg.bin
 ╚════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝

Type `help` or `browse(help)` to begin

/opt/devign-master/data/cpg/3_cpg.bin

Does anyone able to use more than 400 data to train the model?

hyz1433376288 commented 11 months ago

Same as you, have you fixed it？

Chris33Edwards commented 10 months ago

I meet the same problems with you! I want to konw if you have resolved this problem.

hyz1433376288 commented 10 months ago

1.This problem may result from your vul-code file inputed is too large, in file main.py line 44 ,the select is to filter large vul-code file, you can try to add it if you comment it. 2.Actually I fail to solve the problem, in the end, I slice the dataframe per 200 to generate bin and json files, remain entirely the config file, give up modifying it

MilkteaBoy-code commented 4 months ago

anyone solves this problem? I also meet this.

MilkteaBoy-code commented 4 months ago

1.This problem may result from your vul-code file inputed is too large, in file main.py line 44 ,the select is to filter large vul-code file, you can try to add it if you comment it. 2.Actually I fail to solve the problem, in the end, I slice the dataframe per 200 to generate bin and json files, remain entirely the config file, give up modifying it

Did you only modify the config.json file? 微信截图_20240421235754 modifying 100 to 200? This problem didn't happen?

hyz1433376288 commented 4 months ago

1.This problem may result from your vul-code file inputed is too large, in file main.py line 44 ,the select is to filter large vul-code file, you can try to add it if you comment it. 2.Actually I fail to solve the problem, in the end, I slice the dataframe per 200 to generate bin and json files, remain entirely the config file, give up modifying it

Did you only modify the config.json file? modifying 100 to 200? This problem didn't happen?

no,I modify in file main.py line 44 ,the select function , let it select 200 item once

MilkteaBoy-code commented 4 months ago

1.This problem may result from your vul-code file inputed is too large, in file main.py line 44 ,the select is to filter large vul-code file, you can try to add it if you comment it. 2.Actually I fail to solve the problem, in the end, I slice the dataframe per 200 to generate bin and json files, remain entirely the config file, give up modifying it

Did you only modify the config.json file? modifying 100 to 200? This problem didn't happen?

no,I modify in file main.py line 44 ,the select function , let it select 200 item once

Do you only select 200 projects in total or 200 projects each time as a batch? Would it be convenient to take a look at your modified code? Thanks!

MilkteaBoy-code commented 4 months ago

1.This problem may result from your vul-code file inputed is too large, in file main.py line 44 ,the select is to filter large vul-code file, you can try to add it if you comment it. 2.Actually I fail to solve the problem, in the end, I slice the dataframe per 200 to generate bin and json files, remain entirely the config file, give up modifying it

Did you only modify the config.json file? modifying 100 to 200? This problem didn't happen?

no,I modify in file main.py line 44 ,the select function , let it select 200 item once

Do you only select 200 projects in total or 200 projects each time as a batch? Would it be convenient to take a look at your modified code? Thanks!

I mean 200 functions

YouNotWalkAlone commented 4 months ago

1.此问题可能是由于输入的vul-code文件太大，在文件行44中，是过滤较大的vul-code文件，如果您注释它，可以尝试添加它。 2.实际上我没有解决问题，最后，我每 200 个切片 dataframe 生成 bin 和 json 文件，完全保留配置文件，放弃修改它main.py``select

Hello，have you continued to work on this model？I used other datasets for this model and ran into a lot of strange problems.

MilkteaBoy-code commented 4 months ago

1.此问题可能是由于输入的vul-code文件太大，在文件行44中，是过滤较大的vul-code文件，如果您注释它，可以尝试添加它。 2.实际上我没有解决问题，最后，我每 200 个切片 dataframe 生成 bin 和 json 文件，完全保留配置文件，放弃修改它main.pyselect ``

Hello，have you continued to work on this model？I used other datasets for this model and ran into a lot of strange problems.

I am still working on this model, but I have not yet solved this issue. I have noticed your other response in problem #22. I will reply to you on your issue page.

MilkteaBoy-code commented 4 months ago

hello, do you solve this problem?

hyz1433376288 commented 4 months ago

hello, do you solve this problem?

我没有解决这个问题，而是避免这个问题的出现，修改select函数，每次只处理200个，分多次执行，具体看下图

MilkteaBoy-code commented 4 months ago

hello, do you solve this problem?

我没有解决这个问题，而是避免这个问题的出现，修改select函数，每次只处理200个，分多次执行，具体看下图在数据集select阶段我没有卡死，生成.bin文件也全部成功了。在.bin文件转为.json文件的时候在扫描到3_cpg.bin的时候卡死了。你的意思是这是由于在select阶段选择的函数所生产的.bin文件过大导致后续转换为.json时候卡死？所以你在select阶段（你截图的第14行）进行了函数长度的筛选？然后每次create阶段生成0，1两个cpg,bin\cpg.json\cpg.pkl。（我运行了一下是这样，不会卡死，但是每次只有两个cpg文件）

YouNotWalkAlone commented 4 months ago

hello, do you solve this problem?

我没有解决这个问题，而是避免这个问题的出现，修改select函数，每次只处理200个，分多次执行，具体看下图在数据集select阶段我没有卡死，生成.bin文件也全部成功了。在.bin文件转为.json文件的时候在扫描到3_cpg.bin的时候卡死了。你的意思是这是由于在select阶段选择的函数所生产的.bin文件过大导致后续转换为.json时候卡死？所以你在select阶段（你截图的第14行）进行了函数长度的筛选？然后每次create阶段生成0，1两个cpg,bin\cpg.json\cpg.pkl。（我运行了一下是这样，不会卡死，但是每次只有两个cpg文件）

对，我的思路和hyz这个用户是一样的

MilkteaBoy-code commented 4 months ago

hello, do you solve this problem?

我没有解决这个问题，而是避免这个问题的出现，修改select函数，每次只处理200个，分多次执行，具体看下图在数据集select阶段我没有卡死，生成.bin文件也全部成功了。在.bin文件转为.json文件的时候在扫描到3_cpg.bin的时候卡死了。你的意思是这是由于在select阶段选择的函数所生产的.bin文件过大导致后续转换为.json时候卡死？所以你在select阶段（你截图的第14行）进行了函数长度的筛选？然后每次create阶段生成0，1两个cpg,bin\cpg.json\cpg.pkl。（我运行了一下是这样，不会卡死，但是每次只有两个cpg文件）

对，我的思路和hyz这个用户是一样的就是说每次只生成两个切片数据去运行程序？还是说每次生成两个，两个两个累计全部270个切片全部生成完后，然后进行后续的embbeding和training 我的疑惑点在这里

MilkteaBoy-code commented 4 months ago

hello, do you solve this problem?

我没有解决这个问题，而是避免这个问题的出现，修改select函数，每次只处理200个，分多次执行，具体看下图在数据集select阶段我没有卡死，生成.bin文件也全部成功了。在.bin文件转为.json文件的时候在扫描到3_cpg.bin的时候卡死了。你的意思是这是由于在select阶段选择的函数所生产的.bin文件过大导致后续转换为.json时候卡死？所以你在select阶段（你截图的第14行）进行了函数长度的筛选？然后每次create阶段生成0，1两个cpg,bin\cpg.json\cpg.pkl。（我运行了一下是这样，不会卡死，但是每次只有两个cpg文件）

对，我的思路和hyz这个用户是一样的

就是说每次只生成两个切片数据去运行程序？还是说每次生成两个，两个两个累计全部270个切片全部生成完后，然后进行后续的embbeding和training 我的疑惑点在这里

hyz1433376288 commented 4 months ago

hello, do you solve this problem?

我没有解决这个问题，而是避免这个问题的出现，修改select函数，每次只处理200个，分多次执行，具体看下图在数据集select阶段我没有卡死，生成.bin文件也全部成功了。在.bin文件转为.json文件的时候在扫描到3_cpg.bin的时候卡死了。你的意思是这是由于在select阶段选择的函数所生产的.bin文件过大导致后续转换为.json时候卡死？所以你在select阶段（你截图的第14行）进行了函数长度的筛选？然后每次create阶段生成0，1两个cpg,bin\cpg.json\cpg.pkl。（我运行了一下是这样，不会卡死，但是每次只有两个cpg文件）

对，我的思路和hyz这个用户是一样的

就是说每次只生成两个切片数据去运行程序？还是说每次生成两个，两个两个累计全部270个切片全部生成完后，然后进行后续的embbeding和training 我的疑惑点在这里

后者

epicosy / devign

Whenever the input data is larger the program gets stuck #19