Closed VOID001 closed 5 years ago
By the way I build the program on the master branch
By the way I build the program on the master branch
The amount of memory used depends on the data. For some cases, you can think of the peaking memory consumption more than 200x comparing to the original size.
I am currently in the process of rolling out a new version which is faster and memory efficient, but it won't be available until summer. As graph mining is not my research focus, I really do not have free time slots right now.
I am trying to schedule a phone call with your partner.
Hello Jokeren, I am helping my friends use gbolt with a 5GB dataset, however it keeps gives the segmentation fault and each time the dmesg seems crash on different part, I am sorry but I cannot provide the core file this time. And I am not familiar with data mining, so I've go through the issues and found that if my dataset is in wrong format then the program may crash down. However I verify the dataset with a Python script (I will attach below) and I don't found any wholes in the graph, all the vertex id is continuous and data set number is also consistent, the data set is huge (5GB) so I think I cannot upload it here, would you please give me a hint on why this problem may happen? Some useful log entries:
https://cfp.vim-cn.com/cbfv6/bash#n-496 https://cfp.vim-cn.com/cbfv6/bash#n-455
the argument I use is -support 0.6 the server has 80 cpu threads and it use the default option to run Thanks in advance
!/bin/env python
if name == "main": f = open("/path/to/dataset.txt", "r") prev = 0 prevline = "" dataSetStart = False for line in f: if line.startswith("t #"): dataSetStart = True print("Dataset {}".format(line)) continue if line.startswith("v "): # It's a vertex data vertex_id = int(line.split(" ")[1]) label = int(line.split(" ")[2]) if label < 0 or vertex_id < 0: print("Found invalid data:\n{}\n{}".format(prevline, line))
print("line={}\nVertex id = {} prev = {}".format(line, vertex_id, prev))
if dataSetStart: dataSetStart = False prev = vertex_id prevline = line # raise Exception("WTF START prev = {}, vertex_id = {}".format(prev, vertex_id)) continue if not dataSetStart: if prev + 1 != vertex_id: print("Found invalid data:\n{}\n{}".format(prevline, line)) prev = vertex_id prevline = line if line.startswith("e "): arr = line.split(" ") v1 = int(arr[1]) v2 = int(arr[2]) lbl = int(arr[3]) if v1 < 0 or v2 < 0 or lbl < 0: print("Found invalid data:\n{}\n{}".format(prevline, line))
Sometimes memory error occurs not because of insufficient memory, but it's actually stack memory overflows due to recursive function calls as some data structures are allocated on stack.
By the way I build the program on the master branch
The amount of memory used depends on the data. For some cases, you can think of the peaking memory consumption more than 200x comparing to the original size.
I am currently in the process of rolling out a new version which is faster and memory efficient, but it won't be available until summer. As graph mining is not my research focus, I really do not have free time slots right now.
I am trying to schedule a phone call with your partner.
Thanks for the quick response! Yeah, I found him got OOM on a 1TB machine when runnning the program (on a 40GB dataset :O) So maybe a 5GB dataset is not able to pass the memory consumption requirment :P
Hello Jokeren, I am helping my friends use gbolt with a 5GB dataset, however it keeps gives the segmentation fault and each time the dmesg seems crash on different part, I am sorry but I cannot provide the core file this time. And I am not familiar with data mining, so I've go through the issues and found that if my dataset is in wrong format then the program may crash down. However I verify the dataset with a Python script (I will attach below) and I don't found any wholes in the graph, all the vertex id is continuous and data set number is also consistent, the data set is huge (5GB) so I think I cannot upload it here, would you please give me a hint on why this problem may happen? Some useful log entries: https://cfp.vim-cn.com/cbfv6/bash#n-496 https://cfp.vim-cn.com/cbfv6/bash#n-455 the argument I use is -support 0.6 the server has 80 cpu threads and it use the default option to run Thanks in advance
!/bin/env python
if name == "main": f = open("/path/to/dataset.txt", "r") prev = 0 prevline = "" dataSetStart = False for line in f: if line.startswith("t #"): dataSetStart = True print("Dataset {}".format(line)) continue if line.startswith("v "): # It's a vertex data vertex_id = int(line.split(" ")[1]) label = int(line.split(" ")[2]) if label < 0 or vertex_id < 0: print("Found invalid data:\n{}\n{}".format(prevline, line))
print("line={}\nVertex id = {} prev = {}".format(line, vertex_id, prev))
if dataSetStart: dataSetStart = False prev = vertex_id prevline = line
raise Exception("WTF START prev = {}, vertex_id = {}".format(prev, vertex_id))
continue if not dataSetStart: if prev + 1 != vertex_id: print("Found invalid data:\n{}\n{}".format(prevline, line)) prev = vertex_id prevline = line if line.startswith("e "): arr = line.split(" ") v1 = int(arr[1]) v2 = int(arr[2]) lbl = int(arr[3]) if v1 < 0 or v2 < 0 or lbl < 0: print("Found invalid data:\n{}\n{}".format(prevline, line))
Sometimes memory error occurs not because of insufficient memory, but it's actually stack memory overflows due to recursive function calls as some data structures are allocated on stack.
Yeah, I will try to increase the default stack size, and try to get a core file to look into first. Thanks for your response :)
Should I use -O0 to debug the stack footprint? or I could just use -O3 and see the stack backtrace? Will -O3 contains possible flaws (optimization, etc)?
Should I use -O0 to debug the stack footprint? or I could just use -O3 and see the stack backtrace? Will -O3 contains possible flaws (optimization, etc)?
-O3 -g or -O2 -g or CMAKE_BUILD_TYPE Debug, whatever you want.
It's possible that compiler optimizations may affect the correctness, but I don't think it applies to my code.
It's probably better to set the number of threads to be one while you are debugging.
If the data are not credential, you guys can upload them somewhere on the internet so that I can test my new branch. I haven't tried any real world data yet, though Facebook guys told me gBolt ran correctly and efficiently in their circumstance. With the help of your data, we can work together to make gBolt a better tool.
I probably know where the problems are, and I know exactly the solution to these problems. But again, I do now have time to fix them now.
Sorry for troubling you. If you find it's really hard to debug, please try to use other tools. I will let you know after releasing gBolt 1.0 and getting correct results on your data.
Thank you , I am trying to debug it such as detecting the stack depth and try to make a auto increasing stack, I will contact him if he could provide the dataset. If I cannot make it, I might go and ask if you could help, I am very appreciated your quick reply although you are not currently working on this project. And I also recommend my friend to have a Skype voice chat with you. Wish the project could be better :D
Hello Jokeren, I am helping my friends use gbolt with a 5GB dataset, however it keeps gives the segmentation fault and each time the dmesg seems crash on different part, I am sorry but I cannot provide the core file this time. And I am not familiar with data mining, so I've go through the issues and found that if my dataset is in wrong format then the program may crash down. However I verify the dataset with a Python script (I will attach below) and I don't found any wholes in the graph, all the vertex id is continuous and data set number is also consistent, the data set is huge (5GB) so I think I cannot upload it here, would you please give me a hint on why this problem may happen? Some useful log entries:
the argument I use is -support 0.6 the server has 80 cpu threads and it use the default option to run Thanks in advance