Closed carlocolantuoni closed 1 year ago
Currently have this working for all projections but need to modify so it only works with "pca" projections only
Id also like to add a 2 ended color scale to this approach if possible- see image
On Thu, Aug 17, 2023, 13:42 Shaun Adkins @.***> wrote:
Currently have this working but need to modify so it only works with "pca" projections only
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1682706688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7SK2CNUQBOOYWFCNLDXVZJY3ANCNFSM6AAAAAAVM2THWQ . You are receiving this because you authored the thread.Message ID: @.***>
@carlocolantuoni are you looking for a "diverging color scale" like one of these in this chart -> https://matplotlib.org/cheatsheets/_images/cheatsheets-2.png ?
Since your use-case is currently the only use case for this, I'll let you pick the color-scheme (from the png linked).
Ya, i think seismic would be cool, thnx!
On Thu, Aug 17, 2023, 14:18 Shaun Adkins @.***> wrote:
@carlocolantuoni https://github.com/carlocolantuoni are you looking for a "diverging color scale" like one of these in this chart -> https://matplotlib.org/cheatsheets/_images/cheatsheets-2.png ?
Since your use-case is currently the only use case for this, I'll let you pick the color-scheme (from the chart).
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1682753612, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7XV2PBCSZQWISORP33XVZN7LANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
@carlocolantuoni should the scale color divergence center at 0? I was looking at this data https://nemoanalytics.org/projection.html?multipattern_plots=0&layout_id=Micali2023&projection_source=95f5868e&projection_algo=pca and it seems that 0 was the minimum for these projections (which I know you did as NMF but I wanted to use as a PCA example).
Centering at the median value is optimal
On Thu, Aug 17, 2023, 14:53 Shaun Adkins @.***> wrote:
@carlocolantuoni https://github.com/carlocolantuoni should the scale color divergence center at 0? I was looking at this data https://nemoanalytics.org/projection.html?multipattern_plots=0&layout_id=Micali2023&projection_source=95f5868e&projection_algo=pca and it seems that 0 was the minimum for these projections (which I know you did as NMF but I wanted to use as a PCA example).
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1682798889, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7VDPYWV7VMCOKIUEULXVZSCBANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
Here is a screenshot with the vcenter at median value. It does skew the colorbar so that the center of the color is in the center of the colorbar.
@carlocolantuoni in this case is it still appropriate to sort plotting order by the absolute value (further away from 0 is in the forefront), or should i change that to "further away from median value" as well?
Nice to see both ends this way.
Are we limited to the scales on the png?
Having black at the center and colors on the ends would prevent the many cells in the middle range from disappearing into the background - its good to have the many mid range cells visible over which to see the extremes.
Out of interest - what gene cart is being projected here?
On Thu, Aug 17, 2023, 15:06 Shaun Adkins @.***> wrote:
[image: Screenshot 2023-08-17 at 3 05 27 PM] https://user-images.githubusercontent.com/5665914/261411530-5e3501f2-2811-46ab-ad04-92f4e788e025.png
Here is a screenshot with the vcenter at median value
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1682815472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7XXRUQMRPJW5XFNYG3XVZTTLANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
This is the permalink I was using. https://nemoanalytics.org//projection.html?layout_id=Micali2023&gene_symbol_exact_match=1&projection_source=95f5868e&multipattern_plots=0&projection_algo=pca
We can modify the colorscale to be custom using the Matplotlib colormap gradient data structure, but I would have to read a tutorial as I am vaguely familiar with the process. So I would have to choose "blue" as a 0 value, "black" as a 127 value, and "red" as a 255 value to build the custom colormap
To clarify earlier because I may have misread it, you are fine with the colorscale centering on the median weight value, but the points to be sorted by absolute value away from 0 before drawing?
Example with a black center area. I can add more colors and "tighten" the black range so it doesn't cover so much of the colorbar if that is helpful. Currently the gradient is about a third each of blue, black, then red
Example where I add "darkblue" and "darkred" in the middle, and have the "dark" range cover from about 40% - 60% of the way into the colorbar
this looks great Shaun!
i really like your efforts to optimize the colorscale to show what is going on in the data - it makes a huge difference in what we can see! - complication is that each dif dataset will have a different distribution of projected values, so its going to be hard to have a globally optimized colors scale, and we currently have nowhere for the user to be able to control this (unless we want to add an element in the display curation that would only be used for PCA projection, but i tihnk thats otside our).
im also looking at the white centered divergent color scale you have running in production right now, and although i do think the center is better as black, that divergent scale has the advantage that each side of the colorscale goes from light red (or blue) to dark red (or blue) - i think by adding a bright/vibrant red (or blue) to each end of your last example (the one where you added the darkblue and darkred) would get the best of both worlds - can u give that a try? if the squizhes the center black too much, we cold try the same thing with the 1/3 red, 1/3 black, 1/3 blue colorscale.
also - glad you asked if this will work: "colorscale centering on the median weight value, but the points to be sorted by absolute value away from 0 before drawing" - is it possible to sort/plot by distance from the median instead? i think it is important to do both coloring and sorting by median.
really do appreciate your attention to detail here - really makes a difference in what we can see, carlo
On Fri, Aug 18, 2023 at 10:13 AM Shaun Adkins @.***> wrote:
[image: Screenshot 2023-08-18 at 10 12 17 AM] https://user-images.githubusercontent.com/5665914/261628873-3923ef92-8195-4a54-b5e0-1b9b338cfa59.png
Example where I add "darkblue" and "darkred" in the middle, and have the "dark" range cover from about 40% - 60% of the way into the colorbar
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1683983230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7TFTDEF7MO6SICCIILXV52APANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Carlo
def create_projection_pca_colorscale():
"""Create a diverging colorscale but with black in the middle range."""
from matplotlib.colors import LinearSegmentedColormap
# Src: https://matplotlib.org/stable/tutorials/colors/colormap-manipulation.html#directly-creating-a-segmented-colormap-from-a-list
nodes = [0.0, 0.25, 0.4, 0.5, 0.6, 0.75, 1.0]
colors = ["lightblue", "blue", "darkblue", "black", "darkred", "red", "lightcoral"]
return LinearSegmentedColormap.from_list("projection_pca", list(zip(nodes, colors)))
median = np.median(adata[:, gene_symbol].X.squeeze())
sort_order = np.argsort(np.abs(median - adata[:, gene_symbol].X.squeeze()))
ordered_obs = adata.obs.iloc[sort_order].index
adata = adata[ordered_obs, :]
plot_sort_order = False # scanpy auto-sorts by highest value by default so we need to override that
plot_vcenter = median
expression_color = "cividis_r" if colorblind_mode else create_projection_pca_colorscale()
hey shaun, cool - its close! dont want you to spend too much more time on the exact colorscale, so lets try these few variation of the colors argument and then decide from the different versions that we've tried, which we think will be best in most cases:
colors = ["violet", "darkviolet", "blue", "darkblue", "black", "black", "black", "darkred", "red", "darkorange", "yellow"]
colors = ["violet", "darkviolet", "blue", "darkblue", "black", "darkred", "red", "darkorange", "yellow"]
colors = ["violet", "blue", "darkblue", "black", "darkred", "red", "yellow"]
and if its easy, one like the "seismic" divergent scale, but with "lightgray" in the center (if its not easy to alter the preset seismic, dont sweat it)
thnx!
On Fri, Aug 18, 2023 at 1:06 PM Shaun Adkins @.***> wrote:
def create_projection_pca_colorscale(): """Create a diverging colorscale but with black in the middle range.""" from matplotlib.colors import LinearSegmentedColormap
# Src: https://matplotlib.org/stable/tutorials/colors/colormap-manipulation.html#directly-creating-a-segmented-colormap-from-a-list nodes = [0.0, 0.25, 0.4, 0.5, 0.6, 0.75, 1.0] colors = ["lightblue", "blue", "darkblue", "black", "darkred", "red", "lightcoral"] return LinearSegmentedColormap.from_list("projection_pca", list(zip(nodes, colors))) median = np.median(adata[:, gene_symbol].X.squeeze()) sort_order = np.argsort(np.abs(median - adata[:, gene_symbol].X.squeeze())) ordered_obs = adata.obs.iloc[sort_order].index adata = adata[ordered_obs, :] plot_sort_order = False # scanpy auto-sorts by highest value by default so we need to override that plot_vcenter = median expression_color = "cividis_r" if colorblind_mode else create_projection_pca_colorscale()
[image: Screenshot 2023-08-18 at 1 06 09 PM] https://user-images.githubusercontent.com/5665914/261669470-eb94c09f-671a-4698-9505-5843921c59c7.png
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1684198597, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7VXU2NWUO6BIHF62HLXV6OJHANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Carlo
for the alternate versions of the "seismic" we could try these:
colors = ["violet", "darkviolet", "darkblue", "blue", "lightblue", "lightgray", "lightcoral", "red", "darkred", "darkorange", "yellow"]
or
colors = ["darkviolet", "darkblue", "blue", "lightblue", "lightgray", "lightcoral", "red", "darkred", "darkorange"]
or
colors = ["darkviolet", "darkblue", "blue", "lightgray", "red", "darkred", "darkorange"]
or colors = ["violet", "darkviolet", "darkblue", "blue", "lightgray", "red", "darkred", "darkorange", "yellow"]
or
colors = ["blue", "darkblue", "darkviolet", "violet", "lightgray", "yellow", "darkorange", "darkred", "red"]
On Fri, Aug 18, 2023 at 4:11 PM Carlo Colantuoni @.***> wrote:
hey shaun, cool - its close! dont want you to spend too much more time on the exact colorscale, so lets try these few variation of the colors argument and then decide from the different versions that we've tried, which we think will be best in most cases:
colors = ["violet", "darkviolet", "blue", "darkblue", "black", "black", "black", "darkred", "red", "darkorange", "yellow"]
colors = ["violet", "darkviolet", "blue", "darkblue", "black", "darkred", "red", "darkorange", "yellow"]
colors = ["violet", "blue", "darkblue", "black", "darkred", "red", "yellow"]
and if its easy, one like the "seismic" divergent scale, but with "lightgray" in the center (if its not easy to alter the preset seismic, dont sweat it)
thnx!
On Fri, Aug 18, 2023 at 1:06 PM Shaun Adkins @.***> wrote:
def create_projection_pca_colorscale(): """Create a diverging colorscale but with black in the middle range.""" from matplotlib.colors import LinearSegmentedColormap
# Src: https://matplotlib.org/stable/tutorials/colors/colormap-manipulation.html#directly-creating-a-segmented-colormap-from-a-list nodes = [0.0, 0.25, 0.4, 0.5, 0.6, 0.75, 1.0] colors = ["lightblue", "blue", "darkblue", "black", "darkred", "red", "lightcoral"] return LinearSegmentedColormap.from_list("projection_pca", list(zip(nodes, colors))) median = np.median(adata[:, gene_symbol].X.squeeze()) sort_order = np.argsort(np.abs(median - adata[:, gene_symbol].X.squeeze())) ordered_obs = adata.obs.iloc[sort_order].index adata = adata[ordered_obs, :] plot_sort_order = False # scanpy auto-sorts by highest value by default so we need to override that plot_vcenter = median expression_color = "cividis_r" if colorblind_mode else create_projection_pca_colorscale()
[image: Screenshot 2023-08-18 at 1 06 09 PM] https://user-images.githubusercontent.com/5665914/261669470-eb94c09f-671a-4698-9505-5843921c59c7.png
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1684198597, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7VXU2NWUO6BIHF62HLXV6OJHANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Carlo
-- Carlo
sorry slightly obsessed with the visuals - promise we'll stop and pick after this iteration
On Fri, Aug 18, 2023 at 4:32 PM Carlo Colantuoni @.***> wrote:
for the alternate versions of the "seismic" we could try these:
colors = ["violet", "darkviolet", "darkblue", "blue", "lightblue", "lightgray", "lightcoral", "red", "darkred", "darkorange", "yellow"]
or
colors = ["darkviolet", "darkblue", "blue", "lightblue", "lightgray", "lightcoral", "red", "darkred", "darkorange"]
or
colors = ["darkviolet", "darkblue", "blue", "lightgray", "red", "darkred", "darkorange"]
or colors = ["violet", "darkviolet", "darkblue", "blue", "lightgray", "red", "darkred", "darkorange", "yellow"]
or
colors = ["blue", "darkblue", "darkviolet", "violet", "lightgray", "yellow", "darkorange", "darkred", "red"]
On Fri, Aug 18, 2023 at 4:11 PM Carlo Colantuoni < @.***> wrote:
hey shaun, cool - its close! dont want you to spend too much more time on the exact colorscale, so lets try these few variation of the colors argument and then decide from the different versions that we've tried, which we think will be best in most cases:
colors = ["violet", "darkviolet", "blue", "darkblue", "black", "black", "black", "darkred", "red", "darkorange", "yellow"]
colors = ["violet", "darkviolet", "blue", "darkblue", "black", "darkred", "red", "darkorange", "yellow"]
colors = ["violet", "blue", "darkblue", "black", "darkred", "red", "yellow"]
and if its easy, one like the "seismic" divergent scale, but with "lightgray" in the center (if its not easy to alter the preset seismic, dont sweat it)
thnx!
On Fri, Aug 18, 2023 at 1:06 PM Shaun Adkins @.***> wrote:
def create_projection_pca_colorscale(): """Create a diverging colorscale but with black in the middle range.""" from matplotlib.colors import LinearSegmentedColormap
# Src: https://matplotlib.org/stable/tutorials/colors/colormap-manipulation.html#directly-creating-a-segmented-colormap-from-a-list nodes = [0.0, 0.25, 0.4, 0.5, 0.6, 0.75, 1.0] colors = ["lightblue", "blue", "darkblue", "black", "darkred", "red", "lightcoral"] return LinearSegmentedColormap.from_list("projection_pca", list(zip(nodes, colors))) median = np.median(adata[:, gene_symbol].X.squeeze()) sort_order = np.argsort(np.abs(median - adata[:, gene_symbol].X.squeeze())) ordered_obs = adata.obs.iloc[sort_order].index adata = adata[ordered_obs, :] plot_sort_order = False # scanpy auto-sorts by highest value by default so we need to override that plot_vcenter = median expression_color = "cividis_r" if colorblind_mode else create_projection_pca_colorscale()
[image: Screenshot 2023-08-18 at 1 06 09 PM] https://user-images.githubusercontent.com/5665914/261669470-eb94c09f-671a-4698-9505-5843921c59c7.png
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1684198597, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7VXU2NWUO6BIHF62HLXV6OJHANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Carlo
-- Carlo
-- Carlo
shaun got this to a really good place - last comments have been about optimizing color scale for possible scenarios - we can leave this as low prioity for now. look good
FWIW I had to revert the adjustments I made in nemo-production. Seems that when plotting a tSNE with a projection and a categorical annotation like celltype, the scanpy sc.pl.embedding
function would throw an error ValueError: To copy an AnnData object in backed mode, pass a filename:
.copy(filename='myfilename.h5ad')`. To load the object into memory, use '.to_memory()'. I am guessing scanpy is trying to create another AnnData object beyond the object that I have passed in, so I need to research this error for potential solutions
oh thats a pain - they looked great!
On Wed, Sep 6, 2023 at 10:04 AM Shaun Adkins @.***> wrote:
FWIW I had to revert the adjustments I made in nemo-production. Seems that when plotting a tSNE with a projection and a categorical annotation like celltype, the scanpy sc.pl.embedding function would throw an error ValueError: To copy an AnnData object in backed mode, pass a filename: .copy(filename='myfilename.h5ad')`. To load the object into memory, use '.to_memory()'. I am guessing scanpy is trying to create another AnnData object beyond the object that I have passed in, so I need to research this error for potential solutions
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1708440306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U52VL77X3JJ5UGRNLXZB7HRANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Carlo
FWIW I had to revert the adjustments I made in nemo-production. Seems that when plotting a tSNE with a projection and a categorical annotation like celltype, the scanpy
sc.pl.embedding
function would throw an errorValueError: To copy an AnnData object in backed mode, pass a filename:
.copy(filename='myfilename.h5ad')`. To load the object into memory, use '.to_memory()'. I am guessing scanpy is trying to create another AnnData object beyond the object that I have passed in, so I need to research this error for potential solutions
Related issue at https://github.com/scverse/scanpy/issues/2401
Use colorscheme from picture in first comment
Not sure why the zebrafish colorscale starts at orange. I guess that the "0" range contains 50% of the data so it condenses the orchid-indigo-black colorway. These were created after I added code to explicitly copy the annotation data object before passing to scanpy.
@carlocolantuoni thoughts?
ya the zebra fish looks like it misses the entire 1 side of the color scale
one more detail is that you could add a dark blue between the purple and black at the low end
On Wed, Sep 13, 2023 at 2:38 PM Shaun Adkins @.***> wrote:
[image: Screenshot 2023-09-13 at 2 35 56 PM] https://user-images.githubusercontent.com/5665914/267755620-b9d35d72-97ab-44e3-9aab-f86200317446.png [image: Screenshot 2023-09-13 at 2 36 11 PM] https://user-images.githubusercontent.com/5665914/267755671-c1de148f-db48-484d-9287-ba940ceb1375.png
Not sure why the zebrafish colorscale starts at orange. I guess that the "0" range contains 50% of the data so it condenses the orchid-indigo-black colorway.
@carlocolantuoni https://github.com/carlocolantuoni thoughts?
— Reply to this email directly, view it on GitHub https://github.com/IGS/gEAR/issues/509#issuecomment-1718133043, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7QX2WKVIPCCCEZ7P7TX2H4RTANCNFSM6AAAAAAVM2THWQ . You are receiving this because you were mentioned.Message ID: @.***>
-- Carlo
maybe even a dark blue and then a normal blue before the purple
Updated with "blue-ish" colors, and spaced all the colors out evenly. I think I need to set the "vcenter" property of the colorscale equal to the median of the data to correct the colorscale of the second plot. Am guessing that if a lot of the data has 0 weight and is skewed as such, the 0's go black instead of violet. Hopefully setting the "vcenter" property will at least make the colorbar more accurately reflect the values.
EDIT: I just checked and the colorbar is already set to the "vcenter" being the median value. Setting "vcenter=None" did nothing as well.
Equivalent NMF projections on the same two datasets. Any "gray" cell has a value of 0 (originally designed where a gene had no expression in the cell, but in this case the pattern has no weight). I am guessing that for the gray cells, none of the 20 genes in the pattern I tested with have expression, so no weight would occur... which would mean that they would have 0 weight in PCA as well.
It would seem that in this case we would want to avoid putting those cells in the forefront (and in the NMF projection, those cells are further back in the plot figure). But there is the concern of figuring out a real 0 vs a "no expression 0"... maybe we keep the sorting formula as is, but move all 0-weighted things in the back. Thoughts?
Same datasets with PCA but with "vcenter=None" turned off. Everything skews to the lower values
Same dataset with PCA and "vcenter=median" but I sort the 0-value weights to be drawn in the background (all other weights are drawn respective of how far they are from the median). You can see this in the first dataset, but it does nothing for the second dataset. So I don't think this is a viable solution.
Committed
just checked out some projections - it looks great!
did you end up using the version where you commented "Same datasets with PCA but with "vcenter=None" turned off. Everything skews to the lower values"? that looked perfect
zero's (of any kind) plotted as 0 in the dynamic color scale is what we should do for (PCA or NMF), i.e. we should set 0's to gray
The projections on nemoanalytics use vcenter = median, but I am happy to set vcenter=None if you feel it is better.
I can set 0 to grey for all projections, but I will have to rework the code to put them back in the background.
sorry sorry - that was supposed to be "we should NOT set 0's to gray"
doesnt the "vcenter=median" screw up the utricle dataset by dropping the bottom half of the color scale for some reason? the image where you said "Same datasets with PCA but with "vcenter=None" turned off. " looke like it fixed all that
Yea, but I was worried about the majority of datapoints being in the "blue-purple" range instead of being in the "black" range for those situations. I actually calculated the median in those datasets and the median weight was 0.0. But if you are comfortable with the way they look, I'll make the "vcenter" change.
ya - we cant use vcenter=median if it drops half the color scale. but you said in the comment where the image looked good that: " "vcenter=None" turned off " so does that mean you simply did not include the vcentre argument?
Basiclally the scanpy.pl.embedding
plot sets it to None by default if not provided (or explicitly provided) so I just had a switch that would set it to the median value or None. Since this was the only code that warranted a toggle, I can remove my "vcenter" toggle entirely
ok
because PCs are 2-ended, the extreme high values and extreme low values are both equally important. However, because they were built for displaying individual genes where low values are not of as much interest, many of the display options in gEAR/NeMO plot points in increasing order so that highest vales are on top and can be best seen. when displaying the results of a PCA projection ,I'd like to be able to see both ends of the vales distribution - as seen in this example i generated with custom code in R.
In this plot, without the altered plotting order and color scale, all the pink/purple would simply be black. the inset density plot in this image shows that distribution of values represented by colors in the plot. the vertical line indicates where plotting order begins. note that it is not at either end, but rather in the center of the value distribution and as values become more extreme in either direction, they are plotted, such that extreme values at BOTH ends of the distribution end p on top and hence can be seen.