Closed cryptomeme closed 11 years ago
Hi Damon,
thanks for the report. Which OS are you using? I am wondering if your filesystem supports that big files.
Aapo
On Sep 19, 2013, at 10:30 AM, Damon Buckwalter notifications@github.com wrote:
It appears that GraphChi 0.2.3 has trouble creating shard file larger than 2GiB? I have plenty of space to put it, but it errors out. Any additional details I can provide?
Reducing the memory budget sufficiently (so that < 2 GiB shards are created) works around the issue.
Running GraphChi Connected Components program INFO: conversions.hpp(convert_if_notexists:767): Did not find preprocessed shards for nodes_20131001 INFO: conversions.hpp(convert_if_notexists:768): (Edge-value size: 4) INFO: conversions.hpp(convert_if_notexists:769): Will try create them now... INFO: sharder.hpp(start_preprocessing:326): Starting preprocessing, shovel size: 1310720000 INFO: conversions.hpp(convert_edgelist:221): Reading in edge list format! DEBUG: conversions.hpp(convert_edgelist:226): Read 10000000 lines, 180.039 MB . . . DEBUG: conversions.hpp(convert_edgelist:226): Read 1590000000 lines, 29900.5 MB INFO: sharder.hpp(flush:152): Sorting shovel: nodes_201310014.1.shovel, max:1738496712 INFO: sharder.hpp(flush:154): Sort done.nodes_201310014.1.shovel ERROR: ioutil.hpp(writea:129): Could not write 3435929520 bytes! error:Bad file descriptor connectedcomponents_list: ./src/util/ioutil.hpp:130: void writea(int, T*, size_t) [with T = graphchi::edge_with_value
]: Assertion `false' failed. — Reply to this email directly or view it on GitHub.
Aapo Kyrola Ph.D. student, http://www.cs.cmu.edu/~akyrola GraphChi: Big Data - small machine: http://graphchi.org twitter: @kyrpov
I'm using CentOS 6.4 and kernel 2.6.32-358.el6.x86_64
My input file is 31GiB, so I would expect that large files are ok? And in fact, now that I look GraphChi is emitting other files > 2GiB so I may have jumped to conclusions...
Let me make sure I can reproduce the problem and I will get back to you.
BTW, thanks for putting GraphChi out there! It's been a crucial tool for me to do connected components analysis. Even though the computational aspects are a bit 'magical' to me still, it does the job that other approaches can't with the scale of data I'm working with.
If you're ever in PDX, I owe you a few beers at least!
By the way, for connected components analysis, if you can fit O(V) in your memory, you should use the new unionfind_connectedcomponents. It is MUCH faster as it requires only one pass. (Actually you won't need GraphChi for union-find, just a simple pass over the edges suffices).
Aapo
On Sep 19, 2013, at 12:21 PM, Damon Buckwalter notifications@github.com wrote:
I'm using CentOS 6.4 and kernel 2.6.32-358.el6.x86_64
My input file is 31GiB, so I would expect that large files are ok? And in fact, now that I look GraphChi is emitting other files > 2GiB so I may have jumped to conclusions...
Let me make sure I can reproduce the problem and I will get back to you.
BTW, thanks for putting GraphChi out there! It's been a crucial tool for me to do connected components analysis. Even though the computational aspects are a bit 'magical' to me still, it does the job that other approaches can't with the scale of data I'm working with.
If you're ever in PDX, I owe you a few beers at least!
— Reply to this email directly or view it on GitHub.
Aapo Kyrola Ph.D. student, http://www.cs.cmu.edu/~akyrola GraphChi: Big Data - small machine: http://graphchi.org twitter: @kyrpov
It appears that GraphChi 0.2.3 has trouble creating shard file larger than 2GiB? I have plenty of space to put it, but it errors out. Any additional details I can provide?
Reducing the memory budget sufficiently (so that < 2 GiB shards are created) works around the issue.