Putnam-Lab / Lab_Management

14 stars 7 forks source link

RStudio server: processing time/RStudio on desktop has problems for whole-genome bisulfite sequencing data #19

Closed daniellembecker closed 3 years ago

daniellembecker commented 3 years ago

I have been having trouble processing large data files that contain genome annotation information and gene ontology terms on my desktop RStudio when running statistical analysis for whole-genome bisulfite sequencing data (runs very slowly and sometimes just shuts down RStudio randomly). @hputnam and I discussed possibly using an RStudio server in the future. Have others seemed to run into this issue/would this be of interest to the lab group?

AHuffmyer commented 3 years ago

I would be interested in this - I don't have any insights to share, but would like to help/learn!

hgreich commented 3 years ago

I have used R on the PSU cluster (unsure how this differs from R studio servers) and it helps with longer jobs

SamGurr commented 3 years ago

Hi Danielle - are you working in .md or .R files? I also have trouble with Rstudio slowing down specifically if in markdown. To remedy this I have been running everything in .R programming scripts with 'pseudo chunks' commented out to eventually convert to markdown later - havent had trouble with large files this way yet (fingers crossed)

On Mon, May 3, 2021 at 5:17 PM Hannah G Reich @.***> wrote:

I have used R on the PSU cluster (unsure how this differs from R studio servers) and it helps with longer jobs

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Putnam-Lab/Lab_Management/issues/19#issuecomment-831541223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJD6AT573YCDN54KRQI3XTDTL4HHJANCNFSM44BQDTEA .

daniellembecker commented 3 years ago

Hi all, thanks for the input! @SamGurr that is super helpful as I am working in .md files! I will try what you suggested, thanks!

hputnam commented 3 years ago

This could also be run in R on the server and downloaded. I believe @echille did this for some of her work.

echille commented 3 years ago

Yes, I've run some large jobs in R on the server! To do that, I saved the data frames that I needed as Rdata and scp'd them to bluewaves where I ran just the chunks of the scripts that were too computationally intensive for my computer to handle. Then I saved the resulting data frames as Rdata again, scp'd them back onto my computer, loaded them back into R, and continued the script from where I left off on bluewaves. As an example, you can check out lines 243-282 here: https://github.com/echille/Mcapitata_Developmental_Gene_Expression_Timeseries/blob/master/2a-WGCNA/Developmental_WGCNA.Rmd

To get bluewaves set up I asked Kevin to install WGCNA for R because I wasn't sure if I was allowed to install it myself