CONCUSS is a software tool for large scale graph analytics. The efficiency and scalability of CONCUSS come from exploiting the underlying structure and sparsity of the data. It allows users to count the number of occurrences of a specific pattern within a graph (i.e. subgraph isomorphism counting). These counts are a building block for more complicated scientific analysis, such as motif counting and graphlet degree signature creation.
./concuss.py [OPTIONS] graph pattern [config]
Count the number of subgraphs of graph
which are isomorphic to pattern
.
Positional arguments:
graph
- filename of the host graphpattern
- filename, basic pattern or multi
. See basic patterns and multiple-patterns sections below.config
- filename of the configuration settings (defaults to config/default.cfg
)Optional arguments:
-h, --help
- show help message and exit-o [OUTPUT], --output [OUTPUT]
- filename of the result-v, --verbose
- verbose output-p, --profile
- profile function calls-c [COLORING], --coloring [COLORING]
- filename of existing p-centered
coloring. When this option is not selected, CONCUSS will find a coloring itself-C, --coloring-no-verify
- same as -c but do not verify correctness of existing coloring-m [MULTI_PAT_FILE], --multi_pat_file [MULTI_PAT_FILE]
- file containing multiple pattern descriptions (as file paths and/or basic patterns)-e [EXECUTION_DATA], --execution-data [EXECUTION_DATA]
- create ZIP archive EXECUTION_DATA
for visualization with BEAVrExample command:
Using pattern file:
./concuss.py testing/graphs/karate.txt testing/graphs/motifs/K3.txt
Example output:
Number of occurrences of H in G: 270
This means that there are 270 isomorphisms of K3 (a triangle) in the karate network. Since the triangle has 3 automorphisms, the karate network has 90 distinct triangles.
CONCUSS can be run by providing a basic pattern instead of a pattern file for the pattern
argument in the execution command. All other positional and optional arguments remain unchanged.
Basic patterns provide a faster and simpler way of specifying patterns to search for in the graph.
We provide the following commonly used basic patterns:
1-partite basic patterns:
clique wheel cycle path star
Usage:
.concuss.py {path_to_graph}/graph basicpattern|int
|
represents concatenation. The int
specifies the number of vertices in pattern.
Example command:
./concuss.py testing/graphs/karate.txt star4
Example output:
Number of occurrences of H in G: 6588
Bi-partite basic patterns:
biclique
Usage:
./concuss.py {path_to_graph}/graph basicpattern|int,int
|
represents concatenation. The ints
specify the number of vertices in the first and second sets of the bi-partite pattern.
Example command:
./concuss.py testing/graphs/karate.txt biclique2,2
Example output:
Number of occurrences of H in G: 288
CONCUSS supports counting multiple patterns in a single pipeline run. In order to use the multiple pattern pipeline, CONCUSS needs to be run using the command line argument -m [FILENAME]
where [FILENAME]
is a file containing descriptions of patterns either as file paths or basic patterns.
For the positional argument pattern
, the keyword multi
must be used to specify that we are counting multiple patterns.
The format for the multiple pattern file is as follows. Specify each pattern as a basic pattern or a filename on a separate line.
basic_pattern_1
basic_pattern_2
path/to/pattern/file3
...
basic_pattern_n
Example file: multi_size3.txt
star3
path3
testing/graphs/motifs/K3.txt
Example command:
./concuss.py testing/graphs/karate.txt multi -m testing/graphs/motifs/multi_size3.txt
Example output:
Number of occurrences of star3 in G: 786
Number of occurrences of path3 in G: 786
Number of occurrences of testing/graphs/motifs/K3.txt in G: 270
CONCUSS requires a 64-bit operating system and Python 2.7.x interpreter (e.g. CPython or PyPy). Running the pipeline with PyPy typically decreases runtimes due to native function inlining.
The main computations in CONCUSS do not require any additional libraries.
To import graph data in GraphML or GEXF format, the Beautiful Soup package is required.
pip install beautifulsoup
For developers wishing to contribute to CONCUSS, the testing suite (see below) uses the NetworkX python package to verify the correctness of the counts.
pip install networkx
CONCUSS supports reading data files for the host and pattern graphs in the following formats:
p-centered colorings are stored in a file that lists the number of colors used on the first line and each subsequent line is of the form
[VERTEX ID]: [COLOR]
where [VERTEX ID]
and [COLOR]
are integers ranging from 0 to the number of vertices minus one and the number of colors minus one, respectively.
As of version 2.0, CONCUSS supports visualization with
BEAVr. By passing the -e
option
to CONCUSS, a ZIP archive is created containing:
This ZIP archive can be loaded into BEAVr to create an interactive visualization of the execution of CONCUSS. See the BEAVr documentation for more information.
CONCUSS can currently only create visualization output with certain
configuration options for the Compute and Combine stages. The Compute stage
must use the KPattern class, and must be set to not reuse dynamic programming
table entries. The Combine stage must use the InclusionExclusion method of
correcting overcounting. For convenience, a configuration file containing the
required options is provided as config/beavr.cfg
.
The algorithmic workflow in this tool is highly modular, which allows users to swap out subroutines throughout the stages of the pipeline. A default configuration file is provided in config/default.cfg
; this is recommended as the "best practice" for efficiency. Users who want to experiment with algorithmic variations can find details on how to create custom config files in the documentation.
We welcome contributions to CONCUSS, in the form of Pull Requests, where you "fork" our repository and then request that we "pull" your changes into the main branch. You must have a Github account to make a contribution.
Whenever possible, please follow these guidelines for contributions:
A series of tests are included for developers to ensure their changes do not introduce new bugs. See the testing documentation for additional instructions.
Important: CONCUSS is research software, so you should cite us when you use it in scientific publications! Please see the CITATION file for citation information.
CONCUSS is released under the BSD license; see the LICENSE file. Distribution, modification and redistribution, and incorporation into other software is allowed.
We are using Github Issues for communication related to CONCUSS. Feel free to submit feature requests, ask for help (after reading the documentation), or notify us of a potential bug. "Watch" our repository to keep abreast of progress and new releases.
Development of the CONCUSS software package was funded in part by: