gagneurlab / FRASER

FRASER - Find RAre Splicing Events in RNA-seq
MIT License
36 stars 20 forks source link

FRASER Resources and Run Time #49

Open lemdcock opened 1 year ago

lemdcock commented 1 year ago

Dear C. Mertes,

At the moment I'm running FRASER on 61 untreated PBMC samples and 60 treated PBMC samples. However we are running into some issues with HPC infrastructure. The HPC system has limited the compute time for all analysis to 70h, based on the current output of the FRASER model fit step, I'm afraid these 70h will be insufficient time to finish:

Wed Feb  1 12:08:32 2023: Fit step for: 'psi5'.
Wed Feb  1 12:08:34 2023: Running fit with correction method: PCA
Wed Feb  1 12:08:36 2023: Computing PCA ...
Wed Feb  1 12:10:44 2023: Fitting rho ...
Thu Feb  2 06:22:23 2023: rho fit:
             Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
        0.0000000 0.0000001 0.0000001 0.0039224 0.0000001 0.6091761
Thu Feb  2 06:27:56 2023: Compute p values for: 'psi5'.
Fri Feb  3 00:16:14 2023: Adjust p values for: 'psi5'.
Fri Feb  3 00:17:03 2023: Compute Z scores for: 'psi5'.

Fri Feb  3 00:21:16 2023: Fit step for: 'psi3'.
Fri Feb  3 00:21:28 2023: Running fit with correction method: PCA
Fri Feb  3 00:21:30 2023: Computing PCA ...
Fri Feb  3 00:23:18 2023: Fitting rho ...

Furthermore I previously provided FRASER with 90GB RAM for the analysis, which was unfortunately insufficient and I restarted the analysis with 200GB of RAM. Is this normal behaviour for FRASER with these sample amounts? Or can we speed up the model fitting process? Or can I fit each parameter separately in different jobsubmissions?

The script I currently use is the following:

############ NVQ_Run093-NVQ_Run642 Untreated Samples FRASER Analysis: Fit Model and Call Outliers ############

#### Load Required Libraries ####
library(FRASER)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

#### Define Global Parameters ####
WORK_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/work/'
OUT_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/output/'
register(MulticoreParam(workers = 10))

#### Read in FRASER Object ####
fds <- readRDS(paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Filtered_Optim_psi5_psi3_and_theta.rds', sep = ''))

#### Detection of Aberrant Splicing Events ####
# Fit the Splicing Model
fds <- FRASER(fds, q=c(psi5=5, psi3=5, theta=5))

# Annotate Introns
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
orgDb <- org.Hs.eg.db

fds <- annotateRangesWithTxDb(fds, txdb=txdb, orgDb=orgDb)

# Call Splicing Outliers
res <- results(fds, zScoreCutoff=NA, padjCutoff = 0.999999, deltaPsiCutoff=0.0000001)

#### Save Objects ####
# Processed FRASER Object
FDS_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Processed.rds', sep = '')

saveRDS(fds, FDS_File)

# Results Object
RES_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results.rds', sep = '')

saveRDS(res, RES_File)

#### Generate Results File for Each Sample ####
Samples <- unique(res$sampleID)

for (s in Samples){
  Results_Samples <- res[res$sampleID == s]

  write.xlsx(Results_Samples, paste(OUT_DIR, 'Untreated/','NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results_deltaPsi_0_padj_1_', s, '.xlsx', sep = ''))
} 

Kind regards, Laurenz De Cock