deeptools / pyGenomeTracks

python module to plot beautiful and highly customizable genome browser tracks
GNU General Public License v3.0
772 stars 111 forks source link

Track for loops #47

Closed Phlya closed 4 years ago

Phlya commented 6 years ago

Hi, would be great to have a new track type for showing loops on Hi-C data. For example, a bedpe file could be used to plot rectangles (and this could include plotting TADs, if the coordinate pairs are the same). Something like this: image

http://higlass.io/app/?config=AJQIPRFlREeqK2a1pciz_g

joachimwolff commented 6 years ago

This feature is already in development and part of the branch 'longrangecontacts' of HiCExplorer. It is implemented as a --loop option for hicPlotMatrix. However, it is just a red circle and not a nice square. This branch is changing a lot, therefore I don't recommend it to use it. Please be a bit more patient, I hope to finish it this year and than I will bring it to pyGenomeTracks too.

Phlya commented 6 years ago

Of course, no worries, thank you! I was just thinking of adding a new option for the links track myself, actually, but I suppose I'll just wait for when you add it then.

Phlya commented 6 years ago

BTW a little rough around the edges, but I actually made a custom track for now, and it seems to work fine.

class LoopTrack(tracksClass.GenomeTrack):
    SUPPORTED_ENDINGS = ['.bedpe']  # this is used by make_tracks_file to guess the type of track based on file name
    TRACK_TYPE = 'loops'
    OPTIONS_TXT = """
height = 3
title = Loops
color = blue
line width = 1
"""
    def __init__(self, *args, **kwarg):
        super(LoopTrack, self).__init__(*args, **kwarg)
        if 'line width' not in self.properties:
            self.properties['line width'] = 1
        if 'line style' not in self.properties:
            self.properties['line style'] = 'solid'
        self.max_height = None
        valid_intervals = 0
        intervals = []
        line_number = 0
        with open(self.properties['file'], 'r') as file_h:
            for line in file_h.readlines():
                line_number += 1
                if line.startswith(('browser', 'track', '#', 'chrom1')):
                    continue
                try:
                    chrom1, start1, end1, chrom2, start2, end2 = line.strip().split('\t')[:6]
                except Exception as detail:
                    self.log.error('File not valid. The format is chrom1 start1, end1, '
                                   'chrom2, start2, end2\nError: {}\n in line\n {}'.format(detail, line))
                    exit(1)

                try:
                    start1 = int(start1)
                    end1 = int(end1)
                    start2 = int(start2)
                    end2 = int(end2)
                except ValueError as detail:
                    self.log.error("Error reading line: {}. One of the fields is not "
                                   "an integer.\nError message: {}".format(line_number, detail))
                    exit(1)

                assert start1 <= end1, "Error in line #{}, end1 smaller than start1 in {}".format(line_number, line)
                assert start2 <= end2, "Error in line #{}, end2 smaller than start2 in {}".format(line_number, line)

                if chrom1 != chrom2:
                    self.log.warn("Only loops in same chromosome are used. Skipping line\n{}\n".format(line))
                    continue

                if start2 < start1:
                    start1, start2 = start2, start1
                    end1, end2 = end2, end1

                # each interval spans from the smallest start to the largest end
                intervals.append([chrom1, start1, end1, start2, end2])
                valid_intervals += 1

        if valid_intervals == 0:
            self.log.warn("No valid intervals were found in file {}".format(self.properties['file']))
        intervals = pd.DataFrame(intervals, columns=['chrom', 'start1', 'end1', 'start2', 'end2'])
        file_h.close()
        self.loops = intervals

        if 'color' not in self.properties:
            self.properties['color'] = 'blue'

        if 'alpha' not in self.properties:
            self.properties['alpha'] = 0.8

    def plot_y_axis(self, ax, plot_ax):
        pass

    def plot_loops(self, ax, loop):
        from matplotlib.patches import Polygon
        width1 = loop[1] - loop[0]
        width2 = loop[3] - loop[2]
        x0 = (loop[1]+loop[2])/2
        y0 = loop[2] - loop[1]

        x1 = x0 + width2/2
        y1 = y0 + width2#/2

        x2 = (loop[1]+loop[2])/2
        y2 = loop[3] - loop[0]

        x3 = x0 - width1/2
        y3 = y0 + width1#/2

        rectangle = Polygon(np.array([[x0, y0], [x1, y1], [x2, y2], [x3, y3]]),
                           facecolor='none', edgecolor=self.properties['color'],
                           linewidth=self.line_width,
                           ls=self.properties['line style'])
        ax.add_artist(rectangle)

    def plot(self, ax, chrom_region, region_start, region_end):
        """
        Makes a rectangle highlighting an interaction between two regions on a
        linear scale representing interactions between Hi-C bins.
        :param ax: matplotlib axis
        :param label_ax: matplotlib axis for labels
        """
        self.max_height = 0
        count = 0

        loops_in_region = self.loops.query("(chrom=='%s')&(end1>%s)&(end1<%s)&(start2>%s)&(start2<%s)" %\
                                           (chrom_region, region_start, region_end, region_start, region_end))
        loops_in_region = loops_in_region[['start1', 'end1', 'start2', 'end2']]

        for idx, interval in loops_in_region.iterrows():
            if 'line width' in self.properties:
                self.line_width = float(self.properties['line width'])
            else:
                self.line_width = 0.5 * np.sqrt(interval.data)

            self.plot_loops(ax, interval)
            count += 1

        self.log.debug("{} loops were plotted".format(count))
        self.log.debug('title is {}'.format(self.properties['title']))
rikrdo89 commented 5 years ago

Hi, I was also trying to load loop tracks overlaid onto HIC maps... has support for bedpe files been implemented in the latest release of pyGenomeTracks?

lldelisle commented 5 years ago

Hi, Unfortunately, not for the moment, but I am preparing a new release in which I had planned to integrate the loops. @Phlya , would you mind, adding it to one of your branch so I can pull it and give you credit? For the bedpe, you would like which type of plot (arrow for each interval, linked?). Thanks,

Phlya commented 5 years ago

Hi, I am happy to submit a PR, but haven't used it for ages... And as it turns out, pyGenomeTracks now requires python>=3.6 and I am running 3.5, so would have to solve that to test it. So if you want to check that it actually works yourself and fix any potential problems, I am happy to contribute.

This implementations draws rectangles corresponding to each loop.

lldelisle commented 5 years ago

@Phlya , no problem. It is just that I prefer you appear as a contributor. Please, just copy paste it in a file in the tracks folder in a branch of your repo and I will pull it and make it work. Thanks

Phlya commented 5 years ago

It's in my master branch, if that's OK.

https://github.com/Phlya/pyGenomeTracks/commit/1d166a0c23d6db32eb693e5df10dc050417d2b75

lldelisle commented 5 years ago

Pefect I am working on it.