DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.26k stars 555 forks source link

Regression Visualizer Improvements for Users with Red-Green Colorblindness #1134

Closed guindo closed 3 years ago

guindo commented 3 years ago

how to change markers of the residuals plot and prediction plot in order to have different markers for training set and test set. For users with daltonism, different markers sound be a better option intead of having different colors. And also is there any alternative to have different color for best fit and identity in Prediction error plot instead of having same color and different opacity.

residuals

rebeccabilbro commented 3 years ago

Hello @guindo and thank you for checking out Yellowbrick, and thank you especially for reaching out with this request regarding accessibility! I have to admit that, while I am familiar with colorblindness, I had not heard the term daltonism before! From some preliminary research, it appears that this describes red/green color blindness and generally makes it difficult to differentiate two colors that differ primarily in their amount of red; is that correct?

We want to make Yellowbrick as accessible as we can — to that end, would you point me to some examples of visualizations that you feel are doing a good job to support readers with red-green colorblindess, so that I can see some good examples? Or, if there are not many good examples out there, might you be willing to team up with us to work on this? We are a small team of unpaid volunteers and would definitely appreciate the assistance!

In the meantime, to assist users with colorblindness, our Yellowbrick Visualizers do support color customization, so if you wanted to change the default colors for ResidualsPlot, you can do so with the train_color and test_color parameters, eg:

from sklearn.linear_model import Ridge
from yellowbrick.datasets import load_concrete
from yellowbrick.regressor import ResidualsPlot
from sklearn.model_selection import train_test_split

X, y = load_concrete()
X_train, X_test, y_train, y_test = train_test_split(
    X, 
    y, 
    test_size=0.2, 
    random_state=38
)
model = Ridge()
visualizer = ResidualsPlot(
    model, 
    train_color="goldenrod", 
    test_color="darkblue"
)

visualizer.fit(X_train, y_train) 
visualizer.score(X_test, y_test)
visualizer.show()

goldenrod_darkblue

Do you know of a pair of colors that are particularly easy to differentiate (such as the above? or perhaps these?) for those with red-green colorblindness? Your suggestion about changing the markers is also interesting; could you direct me to some examples of scatterplots that use markers in a way that you find particularly helpful for differentiation? I expect that the density of points might also be somewhat important; can you think of any other considerations?

It is also possible to change the color of the identify and best_fit lines in the PredictionError plot, using the line_color argument:

from sklearn.linear_model import Lasso
from yellowbrick.datasets import load_concrete
from yellowbrick.regressor import PredictionError
from sklearn.model_selection import train_test_split

X, y = load_concrete()
X_train, X_test, y_train, y_test = train_test_split(
    X, 
    y, 
    test_size=0.2, 
    random_state=38
)model = Lasso()
visualizer = PredictionError(
    model, 
    line_color="midnightblue"
)

visualizer.fit(X_train, y_train) 
visualizer.score(X_test, y_test)
visualizer.show()

midnightblue

Currently the line_color parameter changes both identity and best_fit, but it would not be too tricky to change that code by updating the visual arguments and how they are consumed. Can you suggest a pair of colors that are easily red-green differentiable and which you feel might make for better default values?

Thank you in advance for your help and for raising this important accessibility issue!

guindo commented 3 years ago

Thank you for your feedback. Daltonism or Colorblindness are same and both is a genetic defect that causes difficulty in distinguishing colors. Many people have some degree of inability to distinguish colors, but still do not know. My suggestion is to offer users to change markers when visualizing .For example, the attached figure is made using Matplotlib you can see we have color but we have also different markers. The reason why using marker first point is colorblindness, the second point imagine I download a scientific paper and print it out using black and white. I will have difficulty to understand the figure. Thank you

thumbnail