TheoryInPractice / CONCUSS

Combatting Network Complexity Using Structural Sparsity
Other
8 stars 5 forks source link

CONCUSS: Combatting Network Complexity Using Structural Sparsity

CONCUSS is a software tool for large scale graph analytics. The efficiency and scalability of CONCUSS come from exploiting the underlying structure and sparsity of the data. It allows users to count the number of occurrences of a specific pattern within a graph (i.e. subgraph isomorphism counting). These counts are a building block for more complicated scientific analysis, such as motif counting and graphlet degree signature creation.

Command line usage

./concuss.py [OPTIONS] graph pattern [config]

Count the number of subgraphs of graph which are isomorphic to pattern.

Positional arguments:

Optional arguments:

Example command:

Using pattern file:

./concuss.py testing/graphs/karate.txt testing/graphs/motifs/K3.txt

Example output:

Number of occurrences of H in G: 270

This means that there are 270 isomorphisms of K3 (a triangle) in the karate network. Since the triangle has 3 automorphisms, the karate network has 90 distinct triangles.

Basic patterns:

CONCUSS can be run by providing a basic pattern instead of a pattern file for the pattern argument in the execution command. All other positional and optional arguments remain unchanged.

Basic patterns provide a faster and simpler way of specifying patterns to search for in the graph.

We provide the following commonly used basic patterns:

1-partite basic patterns:

clique wheel cycle path star

Usage:

.concuss.py {path_to_graph}/graph basicpattern|int

| represents concatenation. The int specifies the number of vertices in pattern.

Example command:

./concuss.py testing/graphs/karate.txt star4

Example output:

Number of occurrences of H in G: 6588

Bi-partite basic patterns:

biclique

Usage:

./concuss.py {path_to_graph}/graph basicpattern|int,int

| represents concatenation. The ints specify the number of vertices in the first and second sets of the bi-partite pattern.

Example command:

./concuss.py testing/graphs/karate.txt biclique2,2

Example output:

Number of occurrences of H in G: 288

Counting multiple patterns in a single CONCUSS run

CONCUSS supports counting multiple patterns in a single pipeline run. In order to use the multiple pattern pipeline, CONCUSS needs to be run using the command line argument -m [FILENAME] where [FILENAME] is a file containing descriptions of patterns either as file paths or basic patterns. For the positional argument pattern, the keyword multi must be used to specify that we are counting multiple patterns.

The format for the multiple pattern file is as follows. Specify each pattern as a basic pattern or a filename on a separate line.

basic_pattern_1
basic_pattern_2
path/to/pattern/file3
...
basic_pattern_n

Example file: multi_size3.txt

star3
path3
testing/graphs/motifs/K3.txt

Example command:

./concuss.py testing/graphs/karate.txt multi -m testing/graphs/motifs/multi_size3.txt

Example output:

Number of occurrences of star3 in G: 786
Number of occurrences of path3 in G: 786
Number of occurrences of testing/graphs/motifs/K3.txt in G: 270

Install and Software Requirements

CONCUSS requires a 64-bit operating system and Python 2.7.x interpreter (e.g. CPython or PyPy). Running the pipeline with PyPy typically decreases runtimes due to native function inlining.

Extra Dependencies

The main computations in CONCUSS do not require any additional libraries.

To import graph data in GraphML or GEXF format, the Beautiful Soup package is required.

pip install beautifulsoup

For developers wishing to contribute to CONCUSS, the testing suite (see below) uses the NetworkX python package to verify the correctness of the counts.

pip install networkx

File Format Compatibilities

Graphs

CONCUSS supports reading data files for the host and pattern graphs in the following formats:

Colorings

p-centered colorings are stored in a file that lists the number of colors used on the first line and each subsequent line is of the form

[VERTEX ID]: [COLOR]

where [VERTEX ID] and [COLOR] are integers ranging from 0 to the number of vertices minus one and the number of colors minus one, respectively.

BEAVr

As of version 2.0, CONCUSS supports visualization with BEAVr. By passing the -e option to CONCUSS, a ZIP archive is created containing:

This ZIP archive can be loaded into BEAVr to create an interactive visualization of the execution of CONCUSS. See the BEAVr documentation for more information.

CONCUSS can currently only create visualization output with certain configuration options for the Compute and Combine stages. The Compute stage must use the KPattern class, and must be set to not reuse dynamic programming table entries. The Combine stage must use the InclusionExclusion method of correcting overcounting. For convenience, a configuration file containing the required options is provided as config/beavr.cfg.

Configuration

The algorithmic workflow in this tool is highly modular, which allows users to swap out subroutines throughout the stages of the pipeline. A default configuration file is provided in config/default.cfg; this is recommended as the "best practice" for efficiency. Users who want to experiment with algorithmic variations can find details on how to create custom config files in the documentation.

Contribution and Testing (for Developers)

We welcome contributions to CONCUSS, in the form of Pull Requests, where you "fork" our repository and then request that we "pull" your changes into the main branch. You must have a Github account to make a contribution.

Whenever possible, please follow these guidelines for contributions:

A series of tests are included for developers to ensure their changes do not introduce new bugs. See the testing documentation for additional instructions.

Citation and License

Important: CONCUSS is research software, so you should cite us when you use it in scientific publications! Please see the CITATION file for citation information. DOI

CONCUSS is released under the BSD license; see the LICENSE file. Distribution, modification and redistribution, and incorporation into other software is allowed.

Getting Help, Feature Requests, and Bug Reports

We are using Github Issues for communication related to CONCUSS. Feel free to submit feature requests, ask for help (after reading the documentation), or notify us of a potential bug. "Watch" our repository to keep abreast of progress and new releases.

Acknowledgements

Development of the CONCUSS software package was funded in part by: