GFAKluge is a C++ parser/writer and a set of command line utilities for manipulating GFA files. It parses GFA to a set of data structures that represent the encoded graph. You can use these components and their fields/members to build up your own graph representation. You can also convert between GFA 0.1 <-> 1.0 <-> 2.0 to glue programs that use different GFA versions together.
Homepage: https://github.com/edawson/gfakluge
License: MIT
A C++11 compliant compiler (we recommend GCC or clang)
OpenMP (via GCC or clang)
NB: GFAKluge cannot be compiled with Apple clang, as it does not include OpenMP.
When make
is run, the gfak
binary is built in the top level directory. It offers the following subcommands:
For CLI usage, run any of the above (including gfak
with no subcommand) with no arguments or -h
. To change specification version, most commands take the -S
flag and a single double
argument.
Examples of various commands are included in the examples.md file.
Examples of the C++ API are included in the interface.md file.
The gfak
utilities are available via homebrew: brew install brewsci/bio/gfakluge
Building GFAKluge from source requires OpenMP. This should be supported on Linux by default. On Apple Mac OS X, we recommend installing gcc:
brew install gcc@8
make CXX=g++-8
or
sudo port install gcc8
make
You can then build libgfakluge and the command line gfak
utilities by typing make
in the repo.
To use GFAKluge in your program, you'll need to
add a few lines to your code. First, add the necessary include line to your C++ code:
Next, make sure that the library is on the proper system paths and compile line:
g++ -o my_exe my_exe.cpp -L/path/to/gfakluge/ -lgfakluge
You should then be able to parse and manipulate gfa from your program:
gg = GFAKluge();
gg.parse_gfa_file(my_gfa_file);
cout << gg << endl;
Internally, lines of GFA are represented as structs with member variables that correspond to their defined fields. Here's the definition for a sequence line, for example:
struct sequence_elem{
std::string seq;
std::string name;
map<string, string> opt_fields;
long id;
};
The structs for contained elements, link elements, and alignment elements are very similar. These individual structs are then wrapped in a set of standard containers for easy access:
map<std::string, std::string> header;
map<string, sequence_elem> name_to_seq;
map<std::string, vector<contained_elem> > seq_to_contained;
map<std::string, vector<link_elem> > seq_to_link;
map<string, vector<alignment_elem> > seq_to_alignment;
All of these structures can be accessed using the get_<Thing>
method, where \<Thing> is the name of the map you would like to retrieve.
They reside in gfakluge.hpp.
GFAKluge now supports GFA2! This brings with it four new structs: edge_elem
, gap_elem
, fragment_elem
, and group_elem
. They're contained in maps much like those for the GFA1 structs.
A few caveats apply:
Tags we specifically do not (i.e. cannot) support in GFA2 -> GFA1 conversion: G - gap, U - unordered group, F - fragment. Links and containments should get converted to edges correctly. Sequence elements should get converted, but watch out for the length field if you hit issues.
GFAKluge is fully compliant with reading GFA2 and GFA0.1 <-> GFA1.0 -> GFA2.0 conversion as of September 2017.
GFAKluge gg;
gg.parse_gfa_file("my_gfa.gfa");
You can then iterate over the aforementioned maps/structs and build out your own graph representation.
I'm working on a low-memory API for reading lines / emitting structs but it won't be this pretty.
GFAKluge og;
sequence_elem s;
s.sequence = "GATTACA";
s.name = "seq1";
og.add_sequence(s);
sequence_elem t;
t.sequence = "AATTGN";
t.name = "seq2";
og.add_sequence(t);
link_elem l;
l.source = s.name;
l.sink = s.name;
l.source_orientation_forward = true;
l.sink_orientation_forward = true;
l.pos = 0;
l.cigar = "";
og.add_link(l.source, l);
cout << og << endl;
ofstream f = ofstream("my_file.gfa);
// Write GFA1
f << og;
// To convert to GFA2:
og.set_version(2.0);
f << od;
Eric T Dawson
github: edawson
Please post an issue for help.
GFAKluge is open-source and community contributions are welcome and appreciated! Please keep the following in mind when contributing to the repo:
gfakluge.hpp
header-only and update the build process if a modification alters it. std::stream
in place of just stream
).