darrenstrash / ReduVCC

MIT License
7 stars 1 forks source link

no output-file produced #2

Open willthbill opened 1 year ago

willthbill commented 1 year ago

I am having trouble producing an output file.

Compilation: scons program=vcc variant=optimized

Then when I run:

./optimized/vcc --output_filename=output.txt --preconfiguration=fsocial --k=2 --run_type="Redu" examples/p2p-Gnutella04.graph

It produced the following standard output:

examples/p2p-Gnutella04.graph, 10876, 79988, 0, 0.014961, 0

But, no output file is produced (output.txt). It also does not produce the standard output file (tmppartition...) when I run with no output_filename parameter.

Am I doing something wrong? Any chance you know what is going on?

willthbill commented 1 year ago

I tried modifying the app/vcc.cpp file, to write to a file like this:

graph_io::writePartition(G, partition_config.filename_output);

It does indeed produce an output-file, but it contains only zeros.

0
0
0
0
...
darrenstrash commented 1 year ago

Hi @willthbill, thank you for bringing this to my attention. The code was never set up to write the cover to a file, the command line option --output_filename was vestigial from a previous project.

I have implemented this enhancement under the branch WRITE_COVER for you to try. I removed all unused command-line options, and added writing the cover to a file. You will need to add the --output_cover_file= command line argument. e.g. :

./optimized/vcc --output_cover_file=output.txt --run_type="Redu" examples/p2p-Gnutella04.graph

All algorithms Redu|ReduVCC|Chalupa|bnr|edge_bnr support this option. Please let me know how it goes.

Finally, please note that the Redu algorithm is not guaranteed to solve the clique cover problem, it only applies reductions. If the reduced graph is empty, then it has found a minimum clique cover. I put a guard to only write to file when the Redu algorithm (or any other algorithm) actually finds a cover. Also note that the heuristic algorithms ReduVCC and Chalupa will write out the best cover found, which is not necessarily optimal. You can always check if the result was optimal from the printed key-value pair optimal=*

willthbill commented 1 year ago

Hi Darren. Thanks a lot for fixing this quickly, I really appreciate it. I am getting really good results using your algorithms on small graphs (I will test it on larger graphs soon).

I have a few comments on my experience so far:

Other than what I mentioned above, I will say that it works really well and it is fast. I am very impressed! Thanks for all the help.

darrenstrash commented 1 year ago

Hi @willthbill , I will investigate the overwriting, but for the other issues I need example graphs to reproduce the issues. In particular, can you give:

  1. A graph that causes ReduVCC to hang.
  2. A graph that, when keeping the seed the same, gives different results in different runs.

For 2: it should mostly give the same results, however there can be some small variation caused by the time limit and busyness of the system: For example if the system is overloaded then not as many operations are run in the same wall time limit, and so it may not reach the same result in the time limit.

As for reading standard output from C++: almost all statements (aside from some log output, which goes to standard error) are written to standard output.

willthbill commented 1 year ago

Hi Darren. The attached a graph that causes ReduVCC to hang when using the following command:

./optimized/vcc --seed=1 --output_cover_file=output.txt --run_type="ReduVCC" input.txt

I would say that at least every 10th time I run the above command it hangs. I did not have to look for a specific seed. It happened often also for other graphs. This is unfortunately the smallest one I could find, and I don't know how to find a minimal example, since I don't know what makes this particular graph hang.

With the attached graph using bnr instead (./out/optimized/vcc --seed=1 --output_cover_file=output.txt --run_type="bnr" input.txt). It sometimes can validate the cover, other times it cannot. input.txt

darrenstrash commented 1 year ago

Thank you for sending the instance. Apparently the search was not defaulting to 10s, it was defaulting to 0s, which was causing the bnr algorithm to not finish sometimes (this may have confused ReduVCC too, (but I was not able to reproduce the issue with it hanging).

I have committed a fix and made the default 10s. Please let me know if you continue to see the issue with ReduVCC or bnr.

willthbill commented 1 year ago

Thank you. It still seems to be hanging, however it does stop after 10seconds now, and it does give a good result. What I mean by hanging is that it prints the output I send previously within 1 second and then does not print anything for 9seconds. This happens every time for Chalupa and ReduVCC (I ran both of them many times).

I ran vcc with bnr, edge_bnr and Redu 1000 times and it did not happen at all with them.

I used input.txt as provided above.

darrenstrash commented 1 year ago

Hi @willthbill , Ah, I see. Based on your description, this "hanging" is expected in ReduVCC and Chalupa. Each line prints only when a smaller solution is found. The fact that it stops early in the 10 seconds, indicates that it quickly finds this solution and never improves it. It won't always find such a high quality solution quickly, but for small graphs it's probably likely.

willthbill commented 1 year ago

That makes sense. And before since the time limit didn't work, that made is hang "forever". I expect that for some graphs we might need a higher time limit. Do you think there is a way to detect that ReduVCC and Chalupa has "converged"? Maybe I can time bnr and use this to set a proper time limit. Thanks again for all the help.

darrenstrash commented 1 year ago

Any attempt to detect if ReduVCC or Chalupa has converged would be only heuristic. One heuristic to try, would be a check such as "if the solution doesn't improve in x seconds, then stop".

Note: If you run bnr, and it finishes, then it gives you the optimal solution and there's no need to run ReduVCC or Chalupa

willthbill commented 1 year ago

But I can't just stop the process while it is running right? Then it won't write to a file. I guess I can binary search the time taken to converge ;D

willthbill commented 1 year ago

So bnr and Redu only gives a result if it is optimal - but with different approaches? Does edge_bnr also only finish with a result if it is optimal?

darrenstrash commented 1 year ago

So bnr and Redu only gives a result if it is optimal - but with different approaches? Does edge_bnr also only finish with a result if it is optimal?

A breakdown:

Redu, bnr, edge_bnr: only write if find an (optimal) clique cover. If Redu doesn't produce an empty kernel, there is no cover, and If they time out then bnr or edge_bnr probably haven't found any cover at all.

ReduVCC and Chalupa only if finding some clique cover.

darrenstrash commented 1 year ago

But I can't just stop the process while it is running right? Then it won't write to a file. I guess I can binary search the time taken to converge ;D

This behavior can be easily integrated into ReduVCC or Chalupa in the code base. If you think this would be helpful, let me know and I can give pointers for how to implement it, or possibly even implement it myself quickly.

willthbill commented 1 year ago

This behavior can be easily integrated into ReduVCC or Chalupa in the code base. If you think this would be helpful, let me know and I can give pointers for how to implement it, or possibly even implement it myself quickly.

This would be very useful. However, you shouldn't spend to much time on it, you already helped us a lot!

darrenstrash commented 1 year ago

I have added a new command-line argument: --improve_time_limit= which accepts a double (a number of seconds) and affects the behavior of ReduVCC and Chalupa. If you give --improve_time_limit=0.1 then it will stop if the solution does not improve within 0.1 seconds. Let me know if you find this helpful!

willthbill commented 1 year ago

Very helpful! Thank you