cytoscape / py4cytoscape

Python library for calling Cytoscape Automation via CyREST
https://Py4Cytoscape.readthedocs.io
Other
71 stars 14 forks source link

gen_node_color_map does not set the correct MAX value for a continuous mapping #138

Open rpillich opened 1 month ago

rpillich commented 1 month ago

I have an NDEx network where the node color is assigned using a continuous mapping based on the values in the Log2FC column.

In my network, the range of Log2FC values is between -4.331 and 1.919, however the mapping values are set incorrectly (-0.3840 >> 1.044) as I have styled the network using a style template (also sourced from NDEx.)

My goal is to reset the node color mappings using the actual values in the Log2FC column. To do so, I use:

p4c.set_node_color_mapping(**p4c.gen_node_color_map('Log2FC untreated IPF v control', p4c.palette_color_brewer_d_RdBu(), style_name=current_style))

then layout and save the network back to NDEx.

Upon re-opening the network in Cytoscape, I see that the value range for the continuous mapping is set betwewen -4.331 and 4.331 instead of -4.331 and 1.919.

It seems that only the MIN value is set correctly, while the MAX is just the opposite of the MIN.

To reproduce the issue, you can use this network in NDEx: https://www.ndexbio.org/viewer/networks/9f0ae06a-715a-11ef-87cb-005056ae3c32

Environment: Mac OS Ventura 13.6.9 Python 3.11 Cytoscape 3.10.1 p4c 1.9

AlexanderPico commented 1 month ago

Hi Rudi. This approach is using convenience methods that follow best practices, including choices in colors and data ranges. The Brewer palettes are designed to be used symmetrically, i.e., the intensity at -4.331 will be the same as that at 4.331, just a different color. So, when visualizing data it is always recommended to identify the absolute maximum in the range and then use that to define the most intense colors at either end of the range. This way a blue color representing -4.331, for example, will be more intense than the red color representing a value of 1 or even 1.9.

If for some reason your use case requires that your break with this recommendation, then you can always "manually" define the colors and data range values explicitly and skip using the convenience function.

Let me know if that's what you actually want to do and we can help figure out that more complicated syntax.

rpillich commented 1 month ago

Hi @AlexanderPico, thanks for your reply.

So in you example above, 2 genes that are activated (with log2FC of 1.1 and 1.8 respectively) will both appear similar, colored with a pale red tone, when actually their over-expression is substantially different. In that same case, if the whole color space (withe to red) was used only for the "real" value range and the midpoint manually set to "zero", those 2 genes would appear more intensely colored and their color difference clearly noticeable.

Bottom line, yes, I would like to use the actual value range rather than the absolute method you explained.

Can py4cytoscape handle such a case?

AlexanderPico commented 1 month ago

Right. The key is the symmetry. You can definitely choose to use something other than the absolute maximum, but whatever value you choose, it should be the same boundary for positive and negative ranges. So, if you want to highlight the gradient within log2FC of 2, then simply set the max range to 2 and the min range to -2. Then you will have the gradient over the range you want, while still maintaining intensity balance.

We added the automated methods later on as a convenience. So, py4cytoscape can definitely handle the manual approach. See the first two examples in the method docs:

https://py4cytoscape.readthedocs.io/en/latest/reference/generated/py4cytoscape.style_mappings.set_node_color_mapping.html#py4cytoscape.style_mappings.set_node_color_mapping

AlexanderPico commented 1 month ago

And just to be clear, if you manually set it to -4 and +2, then you'll have the same problem you described for the positive case, but only for the negative case, i.e., two genes with -1.1 and -1.8 are the same shade of light blue. This is the problem with asymmetric ranges.

rpillich commented 1 month ago

@AlexanderPico The first example reads:

set_node_color_mapping('AverageShortestPathLength', [1.0, 16.36], ['#FBE723', '#440256'], style_name='galFiltered Style')

I am assuming this is a continuous mapping as the mapping type is not specified and "c" is the default. The numbers in [ ] are the lower and upper values of the range while the hex codes in [ ] are the lower and upper color codes.

I have 120 networks to process, the value ranges are all different and I don't know them... So I need those values extracted from the data of each network that I am processing, set as lower and upper limit, and the midpoint set to 'zero'.

Can this be done? In case the answer is yes, what is the syntax that I need to use?

AlexanderPico commented 1 month ago

Here are two ways to approach this based upon our discussion so far:

  1. You want a tight range of balanced, gradient mapping so that the changes between 0 and 2 are clearly illustrated. In this case, you don't have to know the values at all! Just set the range numbers to -2 and 2 for every network. Added bonus: a single legend applies to all cases and they are directly comparable, visually.
  2. Alternatively, you want a balanced gradient set to capture all the data in each network. Then you can use the convenience function **p4c.gen_node_color_map() which determines the range for you per network.