Aluriak / PowerGrASP

Compress graphs
12 stars 1 forks source link
answer-set-programming graph-compression python python-library

PowerGrASP

Graph compression.

Note that this is a full reimplementation of PowerGrASP, taking advantage of ASP and Python lifting and simplifications. For the version published in 2017, see this repository.

CLI

python -m powergrasp mygraph.gml -o compressed.bbl

help !

python -m powergrasp --help

API

import powergrasp
with open('compressed.bbl', 'w') as fd:
    for line in powergrasp.compress_by_cc('mygraph.gml'):
        fd.write(line + '\n')

help !

Sorry, no technical doc for the moment.

Bubble files usage

The bubble file is the main output of PowerGrASP, and describes a power graph. The bubble is a format handled by the CyOog plugin for Cytoscape 2, allowing one to load a bubble formatted file and to visualize the corresponding graph. As a consequence, you have to install Cytoscape 2, and put the CyOog.jar file into path/to/cytoscape/install/dir/plugins/.

Another way to go is to convert the bubble to other formats handling hierarchical graphs, like gexf, dot or the API of cytoscape.js. This is a task implemented by bubbletools.

Configuration

PowerGrASP has some configuration values, that can be overwritten by a powergrasp.cfg file placed in the working directory.

The configuration may be printed in stdout using:

python -m powergrasp --config

The config file may be either in ini or json format, as shown in test/powergrasp.oneshot.cfg and test/powergrasp.manyoptions.cfg.

Configuration allow user to define how much text will be outputed by powergrasp, and to tune the core compression and related optimizations.

See complete list in the options section.

installation

pip install powergrasp

On random error, try adding --no-cache-dir at the end of the command, and check that you are using python 3.

You must have clingo in your path. Depending on your OS, it might be done with a system installation, or through downloading and (compilation and) manual installation.

Changelog

Recipes

A recipe is a description of the motifs to compress. For instance, the data/recipe-test.txt:

biclique    c   g h i
biclique    a b d e
biclique    a b f
biclique    c   d e

Using the --recipe option in CLI, user can provide a recipe for its own graph, allowing to specify prior motifs to search. The order of second and third columns does not have any influence.

The first column usually contains the motif types separated by comma (altough it is not necessary), and allows to provide some options (separated by comma):

The recipe file will be read line by line. Once all lines have been applied, the normal compression compression takes over (unless last option is used).

For a living example, see data/recipe-option-test.txt.

Options

Description of all powergrasp options. All values example are given for INI format.

test integrity

Run some integrity tests during runtime to ensure that compression is working well. May slow the compression a lot, and is mainly a debugging tool.

Default value:

test_integrity = true

show story

Print in stdout the main steps of compression. Default value:

show_story = true

show motif handling

Print in stdout the motif transformation. Default value:

show_motif_handling = false

timers

Maintain and print in stdout some timers. Default value:

timers = true

statistics file

A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if timers option is enabled). Default value:

statistics_file = None

Example value:

statistics_file = statistics.csv

cc statistic file

Filename in which the statistics about each connected component will be written. Data will contain one row for each connected component, with the following data:

Default value stands for no file:

cc_statistic_file = None

Example value:

cc_statistic_file = statistics-cc.csv

global statistics

When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression. Default value:

global_statistics = True

bubble with statistics

When enabled, output bubble will contain statistics about each connected component and the overall graph (if global_statistics is enabled). Default value:

bubble_with_statistics = True

bubble for each step

Generate and save a bubble representation of the graph at each step. Mainly used for debugging. Default value:

bubble_for_each_step = false

output node prefix

Prefix to add to all (power)nodes names in output. Default value:

output_node_prefix = ''

show debug

Show full trace of the compression. Useful for debugging. Default value:

show_debug = False

covered edges from ASP

Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker. Default value:

covered_edges_from_asp = False

bubble with nodes

Nodes and sets are optional in the output bubble. Default value:

bubble_with_nodes = True
bubble_with_sets = True

bubble poweredge factor

Edges in bubble are associated to a factor. Default value:

bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'

bubble embeds cc

If enabled, put each connected component in a dedicated powernode. Default value:

bubble_embeds_cc = no

bubble simplify quotes

When possible, delete the quotes around identifiers in the bubble. May lead to node name collision. Default value:

bubble_simplify_quotes = True

bubble with simple edges

If disabled, will discard simple (i.e. non-power) edges of output. Default value:

bubble_with_simple_edges = True

config file

Load options found in given config file, if it exists. Default value:

config_file = 'powergrasp.cfg'

multishot motif search

Search for multiple motif in a single search. Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable. Default value:

multishot_motif_search = True

biclique lowerbound maxnei

Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered. Default value:

biclique_lowerbound_maxnei = 2

clingo options

Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options). Default value:

clingo_options = {}

Give a particular configuration for bicliques search:

clingo_options = {'no-star-biclique': '--configuration=handy'}

clingo multithreading

Number of CPU available to clingo, or 0 for autodetect number of CPU. Default value:

clingo_multithreading = 1

Use as many thread there are available CPU:

clingo_multithreading = 0

Specify a competing search between 4 threads:

clingo_multithreading = '4,compete'

parallel cc compression

To use to compress connected components in different processes. Default value (optimized in memory):

parallel_cc_compression = 1

Compress with 4 processes :

parallel_cc_compression = 4

Compress with one process for each connected component :

parallel_cc_compression = 0

use star motif

Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs. Default value:

use_star_motif = True

optimize for memory

When possible, prefer memory over CPU. Default value:

optimize_for_memory = False

graph filtering

Ignore edges dynamically determined as impossible to compress. Default value:

graph_filtering = True

keep single nodes

If a node is found to be alone in its connected component, it will be kept nonetheless. Default value:

keep_single_nodes = True

parallel motif search

Use multithreading to search for all motifs in the same time, instead of sequentially. Interestingly, this yields very poor results, and it's even worse with multiprocessing.

parallel_motif_search = False

motif type order

Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.

motif_type_order = star,clique,non-star-biclique,biclique

Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:

motif_type_order = greatest-upperbound-first

Or any variation with worst, lowerbound and last.