MOD101 - Githubissues

raecv commented 2 years ago

This is an example of code I wrote to generate a volcano plot from a .xls file containing CRISPR screen results analyzed by casTLE:

library(tidyverse)
library(readxl)
library(ggrepel)

# USER INPUT: filepath to casTLE results .xls/.xlsm
path_to_file <- "/Users/raelinecv/Desktop/Bassik_lab/casTLE/210720_analyzeNucKO_HP1andMGA/UnbVsBound_HP1_reps.xlsx"
# USER INPUT: will double as the plot title
experiment_name <- "HP1a"

# read in the data to plot, replace spaces
castleResults <- read_excel(path_to_file)
names(castleResults) <- str_replace_all(names(castleResults), c(" " = "."))
# create ggplot of data
p <- ggplot(data=castleResults,
       aes(x = Combo.casTLE.Effect,
           y = Combo.casTLE.Score)) +
  geom_point(alpha = 0.2) +
  theme_minimal() +
  labs(x = "casTLE Effect",
       y = "casTLE Score",
       title = experiment_name) +
  theme(plot.title = element_text(hjust = 0.5, size = 11))

# filter sig hits with positive and negative effect score
sig_neg_hits <- castleResults %>%
  filter(FDR < 0.05, Combo.casTLE.Effect < 0)
sig_pos_hits <- castleResults %>%
  filter(FDR < 0.05, Combo.casTLE.Effect > 0)
# color based on significant hits
p2 <- p +
  geom_point(data = sig_neg_hits,
             aes(x = Combo.casTLE.Effect,
                 y = Combo.casTLE.Score),
             color = 'blue',
             alpha = 0.3) +
  geom_point(data = sig_pos_hits,
             aes(x = Combo.casTLE.Effect,
                 y = Combo.casTLE.Score),
             color = 'red',
             alpha = 0.5) + 
  geom_text_repel(
    data = subset(castleResults, FDR < 0.05),
    aes(label = Symbol),
    size = 3,
    max.overlaps = 20,
    min.segment.length = 0
  )

# generate the plot
p2

Find a piece of spaghetti code you wrote. What are one or two ways you could de-spaghettify it? You don't need to re-write the code, just describe how you would restructure it.

I tried to roughly to de-spaghetti this code by adding "USER-INPUT" sections so I could share this with my labmates and they could fill in the minimum needed, but that still means generating a new file that essentially copies all the code for each volcano plot you want to generate (it only generates one volcano per .R). To de-spaghetti this, I could write the functions to generate the base volcano (p) and the labels (p2), which would let someone generate multiple volcanos in one document. This would be great because I could also add in options to let them edit the default colors, font sizes, etc. I could also try writing modules for my sig_hits filters (and similar steps) in a less verbose way.

Reading the same piece of code, describe one example each of intrinsic, germane, and extrinsic load. These are subjective categories, so explain your answers.

intrinsic: The most important part of this script is setting the path_to_file variable correctly and finding the right variables to change if they are interested that (like setting the FDR filter higher or lower). This is intrinsic because it requires attention every time the user runs the script.
germane: If someone is unfamiliar with the packages I used (like gg_repel), it might take them some time to learn why I picked the options I did and how they might change them if it is buggy in their hands. This is germane because it will get easier the more they familiar they are with it.
extrinsic: Not too much comes to mind re: extrinsic load but a the step renaming all the headers so they are compatible in R is definitely clunky. I could specify that they should be formatted like that before the user loads the file. I could also add more comments to explain reasoning for some choices I made, like the numbers for the options, etc.

mcgoodman commented 2 years ago

Great work!

I find it interesting that you've approached the second question from the perspective of someone using this code, as opposed to your own perspective as someone writing it - e.g., while the intrinsic load for someone just running this might be relatively low, anyone who wanted to understand it or write something similar needs to know much more than just how to specify a file path - the intrinsic load is in what the script is actually doing, more or less.

Totally agree about the germane load - as someone very used to tidyverse / ggplot2, this to me is well written and easy to understand, but if someone is unfamiliar with these packages it can be daunting.

One thing I might add in terms of extrinsic load - because you've hard coded your file path, even if someone wants to take your same data and run this script on it, they need to go through extra work by changing the path for their computer. A relative file path would fix that!

Again, great job, and feel free to close the issue once you've read this!

raecv commented 2 years ago

Thanks!

Adding a relative file path is a good idea and I can definitely implement that.

FlukeAndFeather / jese4sci-MOD

MOD101 #8