Closed cansavvy closed 6 years ago
I think I've addressed all your comments in one way or another. So, whenever you have a chance, if you wanna take another look at this and see what you think, that'd be great.
I am about to go back through this line by line @cansav09. There are a few of things that jump out at me that I think should be addressed:
file.path
wherever you are specifying a path that includes a folder. It looks like you made most of the changes where I specifically pointed this out.plots/
1b-human_rnaseq_gene_coverage.R
that I don't think have been answeredThis is also probably a good spot to start working on your ggplot2
skills, but I think that's beyond the scope of this PR.
Do you only/mainly want file.path
to be used when we are reading something in? Or do we also want to be using it when we are writing something out?
Do you only/mainly want file.path to be used when we are reading something in? Or do we also want to be using it when we are writing something out?
Both. The idea is to make it portable across all systems, though the way you have it will work for the vast majority of cases. You also might want to define directories at the top of scripts. For instance,
library(xxxxx)
# directory set up
results.dir <- "results"
plots.dir <- "plots"
Then you'd be able to do things like jpeg(file.path(plots.dir, "new_plot.jpg))
and you'd only have to change this once at the top of the script instead of every time you saved something to plots.
An even better approach (though not necessary for this, just want to point out good practice) is to have the plots directory and results directories supplied as arguments, that way it's not hardcoded at all.
Okay. I think I addressed those other changes and everything is working.
Alrighty, let me know if you are okay with me merging this PR, and then later I will make a different PR to make it more generally applicable to all species. I can make an issue for that.
ARg. I don't know why half the time my .gitignore's don't work. I added those to it before.
If they're already committed, you need to delete them and then commit again.
The final table with all the info is the results/human_master_platform.csv If you want to see some of the data in graph form, that is also in the results section. Basically, you might want to check that the data you are looking for are here and in the form you are looking for. If that is not the case, let me know.
The RNA-seq numbers are based on genes that are detectable in 70% of cases. (Excluded genes that had zeroes in 30% or more).
I've tried to make it so the whole thing doesn't need to be run if you just want to change some things about the results table, hence why things are saved as Rdata at the end of the scripts.
The most janky part of the script is at the very end of combine_data.R where I match the illumina chips to their respective annotation versions. There's probably a more elegant way to do this, I did a way that works.