FlukeAndFeather / jese4sci-MOD

Modular Architecture track of the jese4sci short course
0 stars 0 forks source link

MOD101 #8

Closed raecv closed 2 years ago

raecv commented 2 years ago

This is an example of code I wrote to generate a volcano plot from a .xls file containing CRISPR screen results analyzed by casTLE:

library(tidyverse)
library(readxl)
library(ggrepel)

# USER INPUT: filepath to casTLE results .xls/.xlsm
path_to_file <- "/Users/raelinecv/Desktop/Bassik_lab/casTLE/210720_analyzeNucKO_HP1andMGA/UnbVsBound_HP1_reps.xlsx"
# USER INPUT: will double as the plot title
experiment_name <- "HP1a"

# read in the data to plot, replace spaces
castleResults <- read_excel(path_to_file)
names(castleResults) <- str_replace_all(names(castleResults), c(" " = "."))
# create ggplot of data
p <- ggplot(data=castleResults,
       aes(x = Combo.casTLE.Effect,
           y = Combo.casTLE.Score)) +
  geom_point(alpha = 0.2) +
  theme_minimal() +
  labs(x = "casTLE Effect",
       y = "casTLE Score",
       title = experiment_name) +
  theme(plot.title = element_text(hjust = 0.5, size = 11))

# filter sig hits with positive and negative effect score
sig_neg_hits <- castleResults %>%
  filter(FDR < 0.05, Combo.casTLE.Effect < 0)
sig_pos_hits <- castleResults %>%
  filter(FDR < 0.05, Combo.casTLE.Effect > 0)
# color based on significant hits
p2 <- p +
  geom_point(data = sig_neg_hits,
             aes(x = Combo.casTLE.Effect,
                 y = Combo.casTLE.Score),
             color = 'blue',
             alpha = 0.3) +
  geom_point(data = sig_pos_hits,
             aes(x = Combo.casTLE.Effect,
                 y = Combo.casTLE.Score),
             color = 'red',
             alpha = 0.5) + 
  geom_text_repel(
    data = subset(castleResults, FDR < 0.05),
    aes(label = Symbol),
    size = 3,
    max.overlaps = 20,
    min.segment.length = 0
  )

# generate the plot
p2
  1. Find a piece of spaghetti code you wrote. What are one or two ways you could de-spaghettify it? You don't need to re-write the code, just describe how you would restructure it.

I tried to roughly to de-spaghetti this code by adding "USER-INPUT" sections so I could share this with my labmates and they could fill in the minimum needed, but that still means generating a new file that essentially copies all the code for each volcano plot you want to generate (it only generates one volcano per .R). To de-spaghetti this, I could write the functions to generate the base volcano (p) and the labels (p2), which would let someone generate multiple volcanos in one document. This would be great because I could also add in options to let them edit the default colors, font sizes, etc. I could also try writing modules for my sig_hits filters (and similar steps) in a less verbose way.

  1. Reading the same piece of code, describe one example each of intrinsic, germane, and extrinsic load. These are subjective categories, so explain your answers.
mcgoodman commented 2 years ago

Great work!

I find it interesting that you've approached the second question from the perspective of someone using this code, as opposed to your own perspective as someone writing it - e.g., while the intrinsic load for someone just running this might be relatively low, anyone who wanted to understand it or write something similar needs to know much more than just how to specify a file path - the intrinsic load is in what the script is actually doing, more or less.

Totally agree about the germane load - as someone very used to tidyverse / ggplot2, this to me is well written and easy to understand, but if someone is unfamiliar with these packages it can be daunting.

One thing I might add in terms of extrinsic load - because you've hard coded your file path, even if someone wants to take your same data and run this script on it, they need to go through extra work by changing the path for their computer. A relative file path would fix that!

Again, great job, and feel free to close the issue once you've read this!

raecv commented 2 years ago

Thanks!

Adding a relative file path is a good idea and I can definitely implement that.