Closed BlackSlipper closed 10 months ago
@BlackSlipper, not sure if the same functionalities are available in vg
, but in odgi
I've implemented ways to get the non-reference node IDs and the non-reference ranges.
odgi paths -i graph.gfa --non-reference-nodes reference_paths.txt > non-reference-node-ids.txt
odgi paths -i graph.gfa --non-reference-ranges reference_paths.txt > non-reference-ranges.bed
In reference_paths.txt
you have to put the names of the paths that constitute your reference, one name for each line (for example, "grch38#1#chr1" if you have a human chromosome 1 pangenome graph and you use grch38 as reference).
Thank you @AndreaGuarracino for a quick reply.
I am currently using cactus v2.7.0 ( currently latest version) through docker.
However, odgi paths command in the docker image paths doesn't seem to have "--non-reference-nodes" and "--non-reference-ranges" options available.
`odgi paths {OPTIONS}
Interrogate the embedded paths of a graph. Does not print anything to stdout
by default!
OPTIONS:
[ MANDATORY ARGUMENTS ]
-i[FILE], --idx=[FILE] Load the succinct variation graph in
ODGI format from this *FILE*. The file
name usually ends with *.og*. It also
accepts GFAv1, but the on-the-fly
conversion to the ODGI format requires
additional time!
[ Path Investigation Options ]
-O[FILE], --overlaps=[FILE] Read in the path grouping *FILE* to
generate the overlap statistics from.
The file must be tab-delimited. The
first column lists a grouping and the
second the path itself. Each line has
one path entry. For each group the
pairwise overlap statistics for each
pairing will be calculated and printed
to stdout.
-L, --list-paths Print the paths in the graph to
stdout. Each path is printed in its
own line.
-l, --list-path-start-end If -L,--list-paths was specified, this
additionally prints the start and end
positions of each path in additional,
tab-delimited coloumns.
-f, --fasta Print paths in FASTA format to stdout.
One line for the FASTA header, another
line for the whole sequence.
-H, --haplotypes Print to stdout the paths in a path
coverage haplotype matrix based on the
graph’s sort order. The output is
tab-delimited: *path.name*,
*path.length*, *path.step.count*,
*node.1*, *node.2*, *node.n*. Each
path entry is printed in its own line.
-N, --scale-by-node-len Scale the haplotype matrix cells by
node length.
-D[CHAR], --delim=[CHAR] The part of each path name before this
delimiter CHAR is a group identifier.
For use with -H, --haplotypes**: it
prints an additional, first column
**group.name** to stdout.
-p[N], --delim-pos=[N] Consider the N-th occurrence of the
delimiter specified with **-D,
--delim** to obtain the group
identifier. Specify 1 for the 1st
occurrence (default).
[ Path Modification Options ]
-K[FILE], --keep-paths=[FILE] Keep paths listed (by line) in *FILE*.
-X[FILE], --drop-paths=[FILE] Drop paths listed (by line) in *FILE*.
-o[FILE], --out=[FILE] Write the dynamic succinct variation
graph to this file (e.g. *.og*).
[ Threading ]
-t[N], --threads=[N] Number of threads to use for parallel
operations.
[ Processing Information ]
-P, --progress Write the current progress to stderr.
[ Program Information ]
-h, --help Print a help message for odgi paths.`
would there be an option that i can use in the current docker image odgi?
Thanks @AndreaGuarracino !! This looks like useful functionality that at least a few people have been after lately.
The Cactus docker contains the latest odgi release. I guess I can switch this to the current master. @AndreaGuarracino any plans on making a new ODGI release soon? It looks like 0.8.3 may be pretty stale at this point?
@BlackSlipper In the meantime you can probably find a new odgi in the pggb docker. One thing to be careful of is that, by default, MC will make ODGI versions of the .full
(unclipped graphs) that will contain unaligned centromeres. These may throw off your numbers unless you explicitly account for them.
You can make odgi
versions of the clipped graphs using --odgi clip
or --chrom-og clip
but keep in mind:
But again, it should still be possible to use odgi
on the full graph, then posprocess the results to mask out clipped regions. This could be an interesting tutorial to add the MC documentation.
Oops! I've just made a new ODGI release! https://github.com/pangenome/odgi/releases/tag/v0.8.4
I want to follow Minigraph-Cactus paper, Supplementary Figure 15.
However, i couldn't find any details to count up the non-reference nodes in Minigraph-Cactus pangenome.
Did you use vg format to count the nodes or gfa format to count up the nodes?
I tried using cactus-hal2maf to convert into maf but HAL file resulting from MC pipeline only allowed me to find nodes that includes the reference.
I was wondering how you were able to make Fig.4a and Supplementary Fig.15.
It would be very kindful of you to explain the methods of this
Thank you in advance!