ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
158 stars 40 forks source link

Plotting HAL graphs #47

Open cooketho opened 8 years ago

cooketho commented 8 years ago

Is there an easy way to plot a HAL graph, as in Fig. 1 of Hickey et al 2013? Unless I'm missing something, I think the graph can be completely specified by a flat text file where each row represents an edge of the graph, which is defined by its associated child segment (child genome ID, start coordinate, end coordinate), and parent segment (parent genome ID, start coordinate, end coordinate), and a boolean indicating whether the edge is an inversion. Given such a file, it shouldn't be too hard to read it in to R and use something like ggplot2 or ggnetwork to make the actual plot of the graph. I guess what I'm asking is: Is it necessary for me to learn the HAL API to extract that information, or is there something that already exists to do that? Thanks!

glennhickey commented 8 years ago

That's an interesting idea, but unfortunately we don't have any text exporters, so you'd need to go into the C++ api to get at the topology. Depending on your data, the graph may be too big and busy to visualize well, but it'd be pretty cool to try.

We have done some work on visualization, though. Most of it's described here (and included in the HAL repo):

http://bioinformatics.oxfordjournals.org/content/30/23/3293.short

Basically, if you can make your hal file accessible by url, you can throw it on the UCSC Genome Browser via the hal2assemblyhub pipeline. This will give you rearrangements between one species and a set of other species, but not a direct view of the HAL graph.

On Fri, Jul 1, 2016 at 2:12 AM, cooketho notifications@github.com wrote:

Is there an easy way to plot a HAL graph, as in Fig. 1 of Hickey et al 2013? Unless I'm missing something, I think the graph can be completely specified by a flat text file where each row represents an edge of the graph, which is defined by its associated child segment (child genome ID, start coordinate, end coordinate), and parent segment (parent genome ID, start coordinate, end coordinate), and a boolean indicating whether the edge is an inversion. Given such a file, it shouldn't be too hard to read it in to R and use something like ggplot2 or ggnetwork to make the actual plot of the graph. I guess what I'm asking is: Is it necessary for me to learn the HAL API to extract that information, or is there something that already exists to do that? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/glennhickey/hal/issues/47, or mute the thread https://github.com/notifications/unsubscribe/AA2_7u8h4nA1mdbo63S9Y1UxmxuL9mQ2ks5qRK_dgaJpZM4JC2Mr .

cooketho commented 8 years ago

OK thanks! I'll give it a shot, although I don't know much C++. In python-like pseudocode, what I'd like to do is something like this:

edgeList = [] for genome in genomes: for segment in genome.bottomSegments: for edge in segment.edges: edgeList.append(edge)

Maybe there's a better way, but that's what I'm going to try and implement. Please let me know if you have any better ideas. Thanks!