AlexsLemonade / compendium-processing

A series of analyses related to refine.bio species compendia
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

cansav09/missingness #6

Closed cansavvy closed 6 years ago

cansavvy commented 6 years ago

The final table with all the info is the results/human_master_platform.csv If you want to see some of the data in graph form, that is also in the results section. Basically, you might want to check that the data you are looking for are here and in the form you are looking for. If that is not the case, let me know.

The RNA-seq numbers are based on genes that are detectable in 70% of cases. (Excluded genes that had zeroes in 30% or more).

I've tried to make it so the whole thing doesn't need to be run if you just want to change some things about the results table, hence why things are saved as Rdata at the end of the scripts.

The most janky part of the script is at the very end of combine_data.R where I match the illumina chips to their respective annotation versions. There's probably a more elegant way to do this, I did a way that works.

cansavvy commented 6 years ago

I think I've addressed all your comments in one way or another. So, whenever you have a chance, if you wanna take another look at this and see what you think, that'd be great.

jaclyn-taroni commented 6 years ago

I am about to go back through this line by line @cansav09. There are a few of things that jump out at me that I think should be addressed:

This is also probably a good spot to start working on your ggplot2 skills, but I think that's beyond the scope of this PR.

cansavvy commented 6 years ago

Do you only/mainly want file.path to be used when we are reading something in? Or do we also want to be using it when we are writing something out?

jaclyn-taroni commented 6 years ago

Do you only/mainly want file.path to be used when we are reading something in? Or do we also want to be using it when we are writing something out?

Both. The idea is to make it portable across all systems, though the way you have it will work for the vast majority of cases. You also might want to define directories at the top of scripts. For instance,

library(xxxxx)

# directory set up
results.dir <- "results"
plots.dir <- "plots"

Then you'd be able to do things like jpeg(file.path(plots.dir, "new_plot.jpg)) and you'd only have to change this once at the top of the script instead of every time you saved something to plots.

An even better approach (though not necessary for this, just want to point out good practice) is to have the plots directory and results directories supplied as arguments, that way it's not hardcoded at all.

cansavvy commented 6 years ago

Okay. I think I addressed those other changes and everything is working.

cansavvy commented 6 years ago

Alrighty, let me know if you are okay with me merging this PR, and then later I will make a different PR to make it more generally applicable to all species. I can make an issue for that.

cansavvy commented 6 years ago

ARg. I don't know why half the time my .gitignore's don't work. I added those to it before.

jaclyn-taroni commented 6 years ago

If they're already committed, you need to delete them and then commit again.