Loading data graph with long time

RapidsAtHKUST / SubgraphMatching

In-Memory Subgraph Matching: An In-depth Study by Dr. Shixuan Sun and Prof. Qiong Luo

MIT License

149 stars 37 forks source link

Loading data graph with long time #9

Closed ShunyangLi closed 2 years ago

ShunyangLi commented 2 years ago

When I load the data graph, it loads more than 24 hours. The data graph has 6553594 vertices and 73141425 edges. I tried ceci and cfl algorithm.

Best, Shunyang

shixuansun commented 2 years ago

When I load the data graph, it loads more than 24 hours. The data graph has 6553594 vertices and 73141425 edges. I tried ceci and cfl algorithm.

Best, Shunyang

Thanks for your question.

When loading the graph, the program will build NLF filter to improve the pruning power. You can set OPTIMIZED_LABELED_GRAPH to 0 in config.h to disable the usage of NLF filter.

ShunyangLi commented 2 years ago

Hi shixuan,

I set the OPTIMIZED_LABELED_GRAPH to 0. And re-compile, but it seems like still not working. I loaded it again, 400 minutes have passed, but it's not over yet.

Best, Shunyang

shixuansun commented 2 years ago

Hi shixuan,

I set the OPTIMIZED_LABELED_GRAPH to 0. And re-compile, but it seems like still not working. I loaded it again, 400 minutes have passed, but it's not over yet.

Best, Shunyang

Could you please share the dataset with one drive so that I can repro the issue? Thanks.

ShunyangLi commented 2 years ago

Hi shixuan,

Sorry about replying late. I just finished uploading the graph since the lab network speed is a little bit slow. This is google drive link: https://drive.google.com/drive/folders/1w5CkDh7KrZuXu-AVn6edervrWBZTrugi?usp=sharing

Best, Shunyang

shixuansun commented 2 years ago

Hi shixuan,

Sorry about replying late. I just finished uploading the graph since the lab network speed is a little bit slow. This is google drive link: https://drive.google.com/drive/folders/1w5CkDh7KrZuXu-AVn6edervrWBZTrugi?usp=sharing

Best, Shunyang

Hi Shunyang,

Well received. I will update you tomorrow. Thanks.

shixuansun commented 2 years ago

Hi Shunyang,

The issue is caused by the data graph format: the vertex ID in bipartite.graph is not continuous. For example, there is no vertex with ID "20448":

➜  unsw_dataset grep "v 20448 " bipartite.graph
➜  unsw_dataset grep "v 20447 " bipartite.graph
v 20447 26 15
➜  unsw_dataset grep "v 20449 " bipartite.graph
v 20449 11 6

You need to fill such holes in the data graph file.

Thanks.

ShunyangLi commented 2 years ago

Hi shixuan,

Many thanks for solving the questions.

Best, Shunyang