Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
57 stars 16 forks source link

feat: Colored characters for AAStrings #96

Closed ahl27 closed 1 year ago

ahl27 commented 1 year ago

Small PR, just adds colored characters for AAStrings. Using a full 20 color palette is possible, but the result is relatively messy--difficult to choose colors that are simultaneously discernable, preserve readability of the characters, and don't overwhelm users.

Compromise chosen was to color bases based on physiochemical properties: https://www.jalview.org/help/html/colourSchemes/zappo.html

Color scheme chosen was a 7 color pastel to reduce the loudness of the color set. Result seems to be fairly easy to read while still being discernable. Color set is not colorblind friendly.

I'm open to adapting the color scheme to something else, it's written to be simple to change. I'm also happy to add in additional documentation to AAString-class.Rd describing how the bases are colored.

All letters: Screen Shot 2023-03-27 at 13 16 25

Example of an alignment: Screen Shot 2023-03-27 at 13 16 17

Example of random characters: Screen Shot 2023-03-27 at 13 16 33

ahl27 commented 1 year ago

I'm realizing I misremembered this as being an issue when it actually isn't...but I still think it's good functionality. Note that lowercase letters are not capitalized--I'm planning to look at #84 and #10 next, which should resolve that.

hpages commented 1 year ago

Hi Aidan,

Thanks for the PR. This looks good to me. Let me take a quick look at the code.

H.

ahl27 commented 1 year ago

Here's another possible color palette using the published color-blind safe palette in https://www.nature.com/articles/nmeth.1618 (see this link for appearance under various colorblind types)

cp <- c("#e69f00", "#56b4e9", "#009e73",
          "#f0e442", "#0072b2", "#d55e00", "#cc79a7")

Screen Shot 2023-03-27 at 14 16 36 Screen Shot 2023-03-27 at 14 17 56

The letters are a little harder to read in this palette, but not too much.

I've also visualized the current palette here for reference.

hpages commented 1 year ago

Hmm.. the color-blind safe palette hardly looks as good as the pastel palette, unfortunately.

Just tried the pastel palette in a terminal with a light background and I like it:

image

It's very rainbow-ish but I don't have problems with that :smile:

Some minor formatting stuff about the code:

Also please add something about this in AAString-class.Rd and XStringSet-class.Rd, like it was done in DNAString-class.Rd and XStringSet-class.Rd for the coloring of DNAString/DNAStringSet and RNAString/RNAStringSet objects.

Thanks!

ahl27 commented 1 year ago

Thanks for the comment!

Re: your points, I'll make those changes.

Maybe it would make sense to let users change the color palette in some way? The DNA/RNA alphabet coloring also has some colorblind issues (see here); it shouldn't be too hard to add a toggle to swap over the coloring scheme if users wish. I can look into that for a future addition.

hpages commented 1 year ago

The DNA/RNA alphabet coloring also has some colorblind issues

Yep. This was pointed out to me when I implemented this :disappointed:

Maybe it would make sense to let users change the color palette in some way? ... I can look into that for a future addition.

Absolutely. Providing more than one palette and providing some way for the user to choose their preferred palettes (e.g. via some global option) would be the way to go. Thanks for the offer!

H.

ahl27 commented 1 year ago

I think I've fixed all the issues--my editors are having trouble showing trailing whitespace, which may explain why there was so much...

I'll plan to look at colorblind support as well next; I have the color palettes, so it shouldn't be a very difficult addition. I can add it into this PR or make a new one.

hpages commented 1 year ago

Thanks, I'll take a 2nd look after lunch.

Will be best to add colorblind support in a separate PR. Personally I don't consider it super important, sorry if that's not a politically correct thing to say. I mean, the colors are a nice addition but it's not like they are mission critical. The most important thing is that we can still read the letters. Look what things look like with the saturation set ot 0 (i.e. black & white vision only):

With the pastel palette: Screenshot from 2023-03-27 11-17-44

With the color-blind safe palette: 228030637-d6b46161-3006-4153-86ba-615490adaa4e

In my opinion the pastel palette is less invasive i.e. the letters are easier to read than with the color-blind safe palette, even if you have black & white vision only. But maybe that's just me...

ahl27 commented 1 year ago

That’s a fair point—I think the fact that the letters are readable and colors are able to be disabled does leave sufficient accessibility(although it’s always nice to have more!). I’m also more inclined towards the pastels, I think it’s a good middle ground between having colors and having them being quiet enough that the letters themselves are still the focal point for the user.

I’m happy to work on adding in the colorblind option next in a separate submission; small low priority PRs like these are great for me to start getting familiar with the codebase.

hpages commented 1 year ago

Also putting the black & white version of your random character example using the pastel palette, for a better comparison with the black & white version of the same example using the color-blind safe palette: 228018292-52200a23-d5fe-461b-bc9a-4235b67dd767

hpages commented 1 year ago

Bumped version and finally added all contributors to Authors@R (a long due change!)

See commit 5d403d1224cb2cd36899d2e29d1ecacdfe218228