Closed Marius1311 closed 7 months ago
Hi Marius, I think that you can use indel_colors
parameter, I usually provide an allele:color mapping object (e.g. dictionary) to determine the exact color I want. An easy way of getting a unique color for each allele is by using the colorhash package.
You can see an example here from my pipeline.
Edit: This is a very simple solution but you could additionally play with the saturation levels, for example, depending on the frequency of the barcode. I think somewhere in cassiopeia this was implemented, or at least used in one of the publications.
Hi @tzeitim, amazing, thanks for this workaround! I will play around with that.
The saturation level can be controlled by passing the indel_priors
argument, but then again colors change every time I call the function.
I would go about it the same way and manually do some sort of color arithmetic at the hsv level using the priors, as you point out.
Hi @Marius1311 and @tzeitim,
Thank you for this question and proposed solution. Indeed it appears that the random allele coloring assignment is not tied properly to the random state, I will look into this.
@tzeitim proposes the same solution I would (and has a nice code block as well - thanks!) whereby you can pass in specific color assignments via indel_colors
(not indel_priors
as you were proposing to use). To maintain functionality with conveying prior probability with color saturation, I would propose you actually use this function (pointing to cas.pl.utilities.get_indel_colors
) for generating the color map.
I will keep you posted about how I fix the random state issue you originally raised.
Thanks @mattjones315! Does the utility function you proposed take care of unedited cut sites already, i.e. making sure that these get a defined color like grey? The solution from @tzeitim did not work for me, I think because it only considered indels from r1
, so I quickly adapted it to take care of all cut sites, and to assign defined colors for unedited and missing cut sites:
def map_indel_to_color(clone_allele_table, kind = 'hsv', missing_color = "#808080", none_color = "#404040"):
'''
Utility function to create a set of colors for allele table plotting alongside the Cassiopeia lineage tree
Inspired by https://github.com/tzeitim/ogtk/blob/294ce1d4ec9c165284b10fcca43210471b9725ec/ogtk/ltr/shltr/pipeline.py#L101
'''
# get the unique set of indels across intBCs and cut sites
indels = [clone_allele_table[cut_site].values for cut_site in ['r1', 'r2', 'r3']]
all_indels = np.unique(np.concatenate(indels))
# initialize a DataFrame to hold the information
color_lookup = pd.DataFrame(all_indels, columns=['indel'])
# using the different types of color definitions, create color palettes
color_lookup['hex'] = color_lookup['indel'].map(lambda x: ColorHash(x).hex)
color_lookup['hsl'] = color_lookup['indel'].map(lambda x: ColorHash(x).hsl)
color_lookup['rgb'] = color_lookup['indel'].map(lambda x: tuple(np.array(ColorHash(x).rgb)/255))
# transform rgb to hsv
color_lookup['hsv'] = color_lookup['rgb'].map(lambda x: colorsys.rgb_to_hsv(*x))
# reset the index
color_lookup.set_index('indel', inplace=True)
# subset to the color kind requested, and rename
color_lookup = color_lookup.rename(columns={kind:'color'}).loc[:, ['color']]
# take care of special colors for missing data
if kind == 'hex':
missing_color_ = missing_color
none_color_ = none_color
elif kind in ['hsl', 'rgb', 'hsv']:
missing_color_rgb = mcolors.to_rgba(missing_color)[:3]
none_color_rgb = mcolors.to_rgba(none_color)[:3]
if kind == 'rgb':
missing_color_ = missing_color_rgb
none_color_ = none_color_rgb
elif kind == 'hsv':
missing_color_ = colorsys.rgb_to_hsv(*missing_color_rgb)
none_color_ = colorsys.rgb_to_hsv(*none_color_rgb)
elif kind == 'hsl':
missing_color_ = colorsys.rgb_to_hls(*missing_color_rgb)
none_color_ = colorsys.rgb_to_hls(*none_color_rgb)
else:
raise ValueError(f"Unknown color kind {kind}")
color_lookup.loc['Missing']['color'] = missing_color_
color_lookup.loc['None']['color'] = none_color_
return color_lookup
Hi @Marius1311, I forgot to mention that my cassette harbours only one cut-site per integration since it's tuned for self-homing guide RNAs. Glad to hear that you could generalise the function for your needs.
When calling
cas.pl.plot_matplotlib(cas_tree, add_root=True, allele_table=clone_allele_table, orient='right', random_state=0, )
, colors are not reproducible, i.e. different every time I call the function, even though I set the random state explicitly.