fbattel1 / qbb2024-answers

0 stars 0 forks source link

Week 2 Assignment #8

Open jonathanfischer97 opened 1 month ago

jonathanfischer97 commented 1 month ago

Grading Assessment for QB Week 2

Part 1: Bash Script for Bedtools Commands (2.5 pts)

Your script correctly implements the necessary bedtools commands for sorting, merging, and subtracting the various feature files. You used sortBed and mergeBed to process the gene, exon, and cCRE files, and correctly created the intron and "other" bed files using subtractBed. Everything was done efficiently, though one small improvement could be to handle temporary files more efficiently (e.g., deleting intermediate files), but this was not necessary for full credit.

Score: 2.5/2.5

Part 2: SNP Enrichment Calculation and Analysis (7.5 pts)

2.1 Shell Script for Calculating Enrichments (4.5 pts)

Your script is well-written and correctly loops through the MAF and feature files using bedtools coverage to calculate the enrichment values. You successfully used awk and bc to compute SNP density and enrichment values. The results were appropriately written to the output file snp_counts.txt.

Score: 4.5/4.5

2.2 Text File with SNP Enrichments (0.5 pts)

The file snp_counts.txt contains all the necessary combinations of MAF and feature values, with calculated enrichments in a well-organized, tab-separated format. Everything was done correctly.

Score: 0.5/0.5

2.3 Plot from Step 2.4 (1.5 pts)

Your plot is clear, with appropriate axis labels, legends, and distinct lines for each genomic feature. However, your log2 transformation of the y-axis was slightly incorrect. It looks like you attempted to apply the log2 transformation in your plot by using scale_y_continuous(trans = "log2"). However, this only transforms the y-axis, not the data itself, which can cause issues when plotting if there are zero or near-zero values in the data (since log2 of zero is undefined).

To correctly transform and display the log2 enrichment values, you should apply the log2 transformation directly to the data before plotting. Here's what you should have done:

df2 <- df2 %>%
  dplyr::mutate(Log2_enrichment = log2(Enrichment))

ggplot(data = df2, aes(x = MAF, y = Log2_enrichment, color = Feature, group = Feature)) +
  geom_point() +
  geom_line() +
  ggtitle("MAF Log2 Enrichment by Feature")

Score: 1.0/1.5

2.4 Answers to Questions in README.md (1.0 pts)

You correctly identify key points in your answers, such as exons being under purifying selection. However, your explanations lack detail. For instance, in Question 1, mentioning the log2 enrichment values and their relationship to purifying selection would have strengthened your response. In Question 2, the idea of natural selection affecting allele frequencies was touched upon but not fully explained. More detail overall would enhance the clarity of your understanding.

Score: 0.5/1.0

Total Score: 9.0/10

jonathanfischer97 commented 1 month ago

Week 2 Regrade:

Better answers in your README, but you forgot to actually plot the Log2 Enrichment after you made the transformation in your plot! You were still plotting "Enrichment" as your y-axis. Stupid mistake I assume, so will split the difference and give you an extra 0.25 on the plot.

Total score: 9.75/10

Good job overall!