Closed kckjn97 closed 7 years ago
I know what might be a problem, but the dataset I have can't reproduce the bug. can you send me the dataset, or point me to download the dataset? also can you show me your config.txt file?
I encountered the same problem after I pulled the new release. The dataset I used is soc-LiveJournal1, which can be downloaded from http://snap.stanford.edu/data/index.html.
this is very weird. I run it on the same dataset and can't reproduce the bug on my machine. what hardware (e.g., SSDs, RAM, etc) do you use? what is your config file (config.txt)? to fix the bug, i need to reproduce it first. thanks.
el2fg ran normally until 12-28 (release: [SAFS]: always count cache hits). My machine has 1 SSD(Samsung 850 pro) and 16GB RAM. I used soc-LiveJournal1 too. I used below configuration. Thanks.
# Parameters for SAFS
# How data blocks of a SAFS file are mapped to Linux filesystem.
RAID_mapping=RAID0
# The number of parallel I/Os is allowed in a single thread.
io_depth=64
# The page cache size in SAFS
cache_size=32M
# The number of NUMA nodes
# num_nodes=1
# change FG_TOP to the path of the top directory of FlashGraph.
root_conf=/home/kckjn97/flashx-conf/data_ej.txt
# whether or not to use Linux huge pages
# huge_page=
# The number of I/O threads per NUMA node.
# num_io_threads=1
# Parameters for FlashGraph
# The number of worker threads to process a graph.
# threads=1
# The max number of vertices being processed in parallel in a worker thread
# max_processing_vertices=1000
# This defines the range size of a partition.
# part_range_size_log=10
# Keep adjacency lists of vertices in memory
# in_mem_graph=
# Run the vertex program on a vertx serially regardless of load balancing.
serial_run=
# The number of vertical partitions on the graph.
# num_vertical_parts=1
# The minimal degree that a vertex is partitioned.
# min_vpart_degree=1
Thank you very much for your feedback. I'll try to fix it as soon as possible.
I know what the problem is now. The source Id and the dest Id in the input file is separated by "\t", so we need to run el2fg as follows:
./el2fg -e config.txt livejournal.txt lv -d "\t"
It's not really a convenient way of running the command. The newer version (in the dev branch) detects the delimiter automatically if it's not specified. I plan to merge the dev branch to the release branch very soon, so I don't plan to do anything for el2fg in the release branch for now. I update the document on how to convert this graph here.
When I tried to migrate the graph file by using el2fg to SAFS, the error occurs. At the old version, el2fg runs normally but the error occurs at the newest version.
$ ./el2fg -e config.txt livejournal.txt lv
[Error Message] el2fg: /home/kckjn97/Dropbox/flashx/matrix/data_frame.cpp:1098: virtual bool fm::{anonymous}::EM_df_groupby_dispatcher::issue_task(): Assertion `off < read_len' failed. Aborted (core dumped)