Jokeren / gBolt

gBolt--very fast implementation for gSpan algorithm in data mining
BSD 2-Clause "Simplified" License
52 stars 14 forks source link

About the source code #19

Closed theodore3131 closed 6 years ago

theodore3131 commented 6 years ago

Hello, for the database that I need to deal with in my research, the label of edges are strings instead of integers, could you give me some instructions about how to modify the source code so that I can use this for my database? Thanks!

Jokeren commented 6 years ago

Two solutions:

  1. You actually do not need to modify a single line of the source code. Just map your strings to unique ids using a simple script. You are supposed to get the following mapping:
v 1 0 dog->v 1 0 1
v 1 1 cat->v 1 1 2

And of course you need another script to transform ids to strings on the result.

  1. If you really want to modify source code, you can take a look at database.cc, and change the format of label which resides on the last column. If you use C++ "string", I expect that you just need to change some lines and get it work very quickly.
theodore3131 commented 6 years ago

Thank you so much for your solution, I will try it out.

theodore3131 commented 6 years ago

Hi, recently I come across a problem again. For a input graph file which is bigger than 100MB, the gbolt would crash because of segmentation fault (core dumped), can you give me any suggestions about this? Thanks again!

Jokeren commented 6 years ago

Hi, if it is a core dump, you should send me the core dump file and your binary, or I couldn't figure out the causes.

You can also locate the core dump position and post it on our discussion thread.

theodore3131 commented 6 years ago

It reads like follows on the terminal:

I0506 12:52:51.015228  3380 gbolt.cc:37] gbolt read input time: 1.61415
I0506 12:52:53.075754  3380 gbolt_execute.cc:27] gbolt construct graph time: 2.06035
I0506 12:52:53.077332  3380 gbolt_execute.cc:54] gbolt thread 0 create
Segmentation fault (core dumped)
Jokeren commented 6 years ago

I think you misunderstand my instructions.

Please send me your data and your arguments to run the program. I will try to figure it out.

theodore3131 commented 6 years ago

Oh, sorry. Here are my commands and the file:

export OMP_NUM_THREADS=1
./build/gbolt -input_file extern/data/rgraph -support 0.05 -output_file ./result0 -pattern=true

rgraph.zip

Jokeren commented 6 years ago

The bug is caused by your rgraph data.

In rgraph, you have total the number of graphs different from the last graph id given by indicator "t #". Please examine your data and make sure there's no "hole" between two graph ids.

Jokeren commented 6 years ago

I will also modify the code to make it more robust.

Jokeren commented 6 years ago

Problem fixed.

Also please delete the last "t # -1" entry from rgraph.

theodore3131 commented 6 years ago

This do works and saves me a lot of time. Thank you so much for your kind help!

Jokeren commented 6 years ago

You're welcome, please let me know if you have further questions.