alperyilmaz / dav-exercises

Exercise questions submitted by Data Analysis and Visualization with R course students at YTU
GNU General Public License v3.0
1 stars 2 forks source link

The Most Long Gene #41

Open efeaybuke opened 6 years ago

efeaybuke commented 6 years ago

Question

There are a lot of chromosomes and chromosomes are occurred negative and positive strands. These strands contain genes. So , how long are these genes? Which genes are longer than others? Please find gene sizes and show the longest 10 genes at the graphic for each strand.

Here is the result: aybus

Here is the the human gene data set: download.file("https://s3-us-west-2.amazonaws.com/veri-analizi/hg19v2_clean.csv","hg19v2_clean.csv")
human_genes <- read_csv("hg19v2_clean.csv")
library(readr)
library(dplyr)
library(ggplot2)

download.file("https://s3-us-west-2.amazonaws.com/veri-analizi/hg19v2_clean.csv","hg19v2_clean.csv")
gene_data <- read_csv("hg19v2_clean.csv") 

long_gene <- gene_data %>% 
  mutate(size = end - start) %>% 
  arrange(desc(size))

negative_strand <- long_gene %>% 
  filter(strand=="-1") %>% 
  top_n(10)

positive_strand <- long_gene %>% 
  filter(strand=="1") %>% 
  top_n(10)

strands <- bind_rows(negative_strand, positive_strand)

strands %>% 
  mutate(gene_name = reorder(gene_name, size)) %>%
  ggplot(aes(x=gene_name, y=size, fill=strand)) +
  geom_col(show.legend = FALSE)+
  facet_wrap(~strand, scales = "free")+
  coord_flip()

Originality

Is this question

Difficulty Level

alperyilmaz commented 6 years ago

this can be done in single pipe actually. top_n or rank functions operate within groups defined by prior group_by function.

human_genes %>% 
  mutate(length=end-start) %>% 
  group_by(strand) %>% 
  top_n(10,length) %>% 
  ggplot(aes(x=reorder(gene_name, length), y=length, fill=as.factor(strand))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~strand , scales = "free")+
  coord_flip()