Graph compression.
Note that this is a full reimplementation of PowerGrASP, taking advantage of ASP and Python lifting and simplifications. For the version published in 2017, see this repository.
python -m powergrasp mygraph.gml -o compressed.bbl
python -m powergrasp --help
import powergrasp
with open('compressed.bbl', 'w') as fd:
for line in powergrasp.compress_by_cc('mygraph.gml'):
fd.write(line + '\n')
Sorry, no technical doc for the moment.
The bubble file is the main output of PowerGrASP, and describes a power graph.
The bubble is a format handled by the CyOog plugin for Cytoscape 2, allowing one to load a bubble formatted file and to visualize the corresponding graph.
As a consequence, you have to install Cytoscape 2, and put the CyOog.jar
file into path/to/cytoscape/install/dir/plugins/
.
Another way to go is to convert the bubble to other formats handling hierarchical graphs, like gexf, dot or the API of cytoscape.js. This is a task implemented by bubbletools.
PowerGrASP has some configuration values,
that can be overwritten by a powergrasp.cfg
file placed in the working directory.
The configuration may be printed in stdout using:
python -m powergrasp --config
The config file may be either in ini or json format, as shown in test/powergrasp.oneshot.cfg
and test/powergrasp.manyoptions.cfg
.
Configuration allow user to define how much text will be outputed by powergrasp, and to tune the core compression and related optimizations.
See complete list in the options section.
pip install powergrasp
On random error, try adding --no-cache-dir
at the end of the command,
and check that you are using python 3.
You must have clingo
in your path. Depending on your OS, it might be done with a system installation,
or through downloading and (compilation and) manual installation.
breakable
or last
__version__
attribute in the packageA recipe is a description of the motifs to compress. For instance, the data/recipe-test.txt:
biclique c g h i
biclique a b d e
biclique a b f
biclique c d e
Using the --recipe
option in CLI, user can provide a recipe for its own graph,
allowing to specify prior motifs to search.
The order of second and third columns does not have any influence.
The first column usually contains the motif types separated by comma (altough it is not necessary), and allows to provide some options (separated by comma):
primer
: allows the program to extend the given motif if possiblebreakable
: allows the program to compress the line with multiple motifsoptional
: if not found, ignore instead of stoppinglast
: stop compression here (useful to avoid the compression process to take over)The recipe file will be read line by line.
Once all lines have been applied, the normal compression compression takes over
(unless last
option is used).
For a living example, see data/recipe-option-test.txt.
Description of all powergrasp options. All values example are given for INI format.
Run some integrity tests during runtime to ensure that compression is working well. May slow the compression a lot, and is mainly a debugging tool.
Default value:
test_integrity = true
Print in stdout the main steps of compression. Default value:
show_story = true
Print in stdout the motif transformation. Default value:
show_motif_handling = false
Maintain and print in stdout some timers. Default value:
timers = true
A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if timers option is enabled). Default value:
statistics_file = None
Example value:
statistics_file = statistics.csv
Filename in which the statistics about each connected component will be written. Data will contain one row for each connected component, with the following data:
Default value stands for no file:
cc_statistic_file = None
Example value:
cc_statistic_file = statistics-cc.csv
When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression. Default value:
global_statistics = True
When enabled, output bubble will contain statistics about each connected component and the overall graph (if global_statistics is enabled). Default value:
bubble_with_statistics = True
Generate and save a bubble representation of the graph at each step. Mainly used for debugging. Default value:
bubble_for_each_step = false
Prefix to add to all (power)nodes names in output. Default value:
output_node_prefix = ''
Show full trace of the compression. Useful for debugging. Default value:
show_debug = False
Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker. Default value:
covered_edges_from_asp = False
Nodes and sets are optional in the output bubble. Default value:
bubble_with_nodes = True
bubble_with_sets = True
Edges in bubble are associated to a factor. Default value:
bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'
If enabled, put each connected component in a dedicated powernode. Default value:
bubble_embeds_cc = no
When possible, delete the quotes around identifiers in the bubble. May lead to node name collision. Default value:
bubble_simplify_quotes = True
If disabled, will discard simple (i.e. non-power) edges of output. Default value:
bubble_with_simple_edges = True
Load options found in given config file, if it exists. Default value:
config_file = 'powergrasp.cfg'
Search for multiple motif in a single search. Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable. Default value:
multishot_motif_search = True
Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered. Default value:
biclique_lowerbound_maxnei = 2
Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options). Default value:
clingo_options = {}
Give a particular configuration for bicliques search:
clingo_options = {'no-star-biclique': '--configuration=handy'}
Number of CPU available to clingo, or 0 for autodetect number of CPU. Default value:
clingo_multithreading = 1
Use as many thread there are available CPU:
clingo_multithreading = 0
Specify a competing search between 4 threads:
clingo_multithreading = '4,compete'
To use to compress connected components in different processes. Default value (optimized in memory):
parallel_cc_compression = 1
Compress with 4 processes :
parallel_cc_compression = 4
Compress with one process for each connected component :
parallel_cc_compression = 0
Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs. Default value:
use_star_motif = True
When possible, prefer memory over CPU. Default value:
optimize_for_memory = False
Ignore edges dynamically determined as impossible to compress. Default value:
graph_filtering = True
If a node is found to be alone in its connected component, it will be kept nonetheless. Default value:
keep_single_nodes = True
Use multithreading to search for all motifs in the same time, instead of sequentially. Interestingly, this yields very poor results, and it's even worse with multiprocessing.
parallel_motif_search = False
Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.
motif_type_order = star,clique,non-star-biclique,biclique
Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:
motif_type_order = greatest-upperbound-first
Or any variation with worst
, lowerbound
and last
.