YosefLab / Cassiopeia

A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction
https://cassiopeia-lineage.readthedocs.io/en/latest/
MIT License
77 stars 24 forks source link

Colors in `cassiopeia.pl.plot_matplotlib` not reproducible #230

Closed Marius1311 closed 7 months ago

Marius1311 commented 1 year ago

When calling cas.pl.plot_matplotlib(cas_tree, add_root=True, allele_table=clone_allele_table, orient='right', random_state=0, ), colors are not reproducible, i.e. different every time I call the function, even though I set the random state explicitly.

tzeitim commented 1 year ago

Hi Marius, I think that you can use indel_colors parameter, I usually provide an allele:color mapping object (e.g. dictionary) to determine the exact color I want. An easy way of getting a unique color for each allele is by using the colorhash package.

You can see an example here from my pipeline.

Edit: This is a very simple solution but you could additionally play with the saturation levels, for example, depending on the frequency of the barcode. I think somewhere in cassiopeia this was implemented, or at least used in one of the publications.

Marius1311 commented 1 year ago

Hi @tzeitim, amazing, thanks for this workaround! I will play around with that.

Marius1311 commented 1 year ago

The saturation level can be controlled by passing the indel_priors argument, but then again colors change every time I call the function.

tzeitim commented 1 year ago

I would go about it the same way and manually do some sort of color arithmetic at the hsv level using the priors, as you point out.

mattjones315 commented 1 year ago

Hi @Marius1311 and @tzeitim,

Thank you for this question and proposed solution. Indeed it appears that the random allele coloring assignment is not tied properly to the random state, I will look into this.

@tzeitim proposes the same solution I would (and has a nice code block as well - thanks!) whereby you can pass in specific color assignments via indel_colors (not indel_priors as you were proposing to use). To maintain functionality with conveying prior probability with color saturation, I would propose you actually use this function (pointing to cas.pl.utilities.get_indel_colors) for generating the color map.

I will keep you posted about how I fix the random state issue you originally raised.

Marius1311 commented 1 year ago

Thanks @mattjones315! Does the utility function you proposed take care of unedited cut sites already, i.e. making sure that these get a defined color like grey? The solution from @tzeitim did not work for me, I think because it only considered indels from r1, so I quickly adapted it to take care of all cut sites, and to assign defined colors for unedited and missing cut sites:

def map_indel_to_color(clone_allele_table, kind = 'hsv', missing_color = "#808080", none_color = "#404040"):
    '''
    Utility function to create a set of colors for allele table plotting alongside the Cassiopeia lineage tree

    Inspired by https://github.com/tzeitim/ogtk/blob/294ce1d4ec9c165284b10fcca43210471b9725ec/ogtk/ltr/shltr/pipeline.py#L101 
    '''

    # get the unique set of indels across intBCs and cut sites
    indels = [clone_allele_table[cut_site].values for cut_site in ['r1', 'r2', 'r3']]
    all_indels = np.unique(np.concatenate(indels))

    # initialize a DataFrame to hold the information
    color_lookup = pd.DataFrame(all_indels, columns=['indel'])

    # using the different types of color definitions, create color palettes
    color_lookup['hex'] = color_lookup['indel'].map(lambda x: ColorHash(x).hex)
    color_lookup['hsl'] = color_lookup['indel'].map(lambda x: ColorHash(x).hsl)
    color_lookup['rgb'] = color_lookup['indel'].map(lambda x: tuple(np.array(ColorHash(x).rgb)/255))

    # transform rgb to hsv
    color_lookup['hsv'] = color_lookup['rgb'].map(lambda x: colorsys.rgb_to_hsv(*x))

    # reset the index
    color_lookup.set_index('indel', inplace=True)

    # subset to the color kind requested, and rename
    color_lookup = color_lookup.rename(columns={kind:'color'}).loc[:, ['color']]

    # take care of special colors for missing data
    if kind == 'hex':
        missing_color_ = missing_color
        none_color_ = none_color
    elif kind in ['hsl', 'rgb', 'hsv']:
        missing_color_rgb = mcolors.to_rgba(missing_color)[:3]
        none_color_rgb = mcolors.to_rgba(none_color)[:3]

        if kind == 'rgb':
            missing_color_  = missing_color_rgb
            none_color_ = none_color_rgb
        elif kind == 'hsv':
            missing_color_ = colorsys.rgb_to_hsv(*missing_color_rgb)
            none_color_ = colorsys.rgb_to_hsv(*none_color_rgb)
        elif kind == 'hsl':
            missing_color_ = colorsys.rgb_to_hls(*missing_color_rgb)
            none_color_ = colorsys.rgb_to_hls(*none_color_rgb)
    else:
        raise ValueError(f"Unknown color kind {kind}")

    color_lookup.loc['Missing']['color'] = missing_color_
    color_lookup.loc['None']['color'] = none_color_

    return color_lookup
tzeitim commented 1 year ago

Hi @Marius1311, I forgot to mention that my cassette harbours only one cut-site per integration since it's tuned for self-homing guide RNAs. Glad to hear that you could generalise the function for your needs.