Closed JGVCruz closed 7 months ago
First to try is to run with serial threading, so use ORFikQC(..., BPPARAM = BiocParallel::serialParam())
It looks like you specified threading to a numeric, it needs to be a class instance of serial or multi threading (a bit complicated I know). For 8 threads normally use BiocParallel::multicoreParam(workers=8)) if I remember correctly
Hi! Thank you so much for answering! So, it worked for the Ribo-seq data, but the RNA is still throwing errors. I think is due to memory constraints (I "only" have 32 Gb available). I tried running it directly on command line, to see if the lack of visual interface would help a little, but it's still not working. Is there a way to reduce memory usage in this step?
First idea is to run in serial mode, multi threading explodes memory usage. If you already are using that, the problem is that you can't load all data in (memory restriction by hardware). If so run the function convert_bam_to_ofst on your rnaseq and then rerun ORFikQC(..., create.ofst = FALSE, BPPARAM = ...)
That will then skip the ofst step (because you manually did it) which is what crashes your run I think. Let me know how it works out.
I tried both, but it still does not work. Does any of the internal functions deal with multi-threading by themselves? I noticed that just before the spike in RAM usage this message appears:
fstcore package v0.9.18 (OpenMP detected, using 12 threads)
Any idea what could that be? Thanks again for the help :)
So this function worked (convert_bam_to_ofst)?
If so can you run:
outputLibs(df[1,], type ="ofst") #outout only first
outputLibs(df, type ="ofst") # all
fstcore is the package that does the final saving to disc of the bam file to ofst format conversion, don't think that is what crashes
Hi! Sorry for the delay.
Yes, the convert_bam_to_ofst works, and so does the other function you asked, be it with only one object or all of them.
Ok, I think it might crash on the plotting step, this is very memory intensive, to validate that, please run this:
# First close your web browser and any memory using app
# Load your exp and library(ORFik) here
debug(ORFikQC)
ORFikQC(df.rfp, create.ofst = FALSE, BPPARAM = BiocParallel::SerialParam())
So, to debug, use the rstudio console, commands are: n + enter (execute next command) s + enter (step into function) f + enter (complete remaining part of given function) Let me know exactly where it fails and the error you see :)
So, everything works until this function:
finals <- alignmentFeatureStatistics(df, BPPARAM = BPPARAM)
It runs until the third of the 6 libraries and eats up all memory, and R just crashes completely :/
Ok, so this is not even the most memory hungry part of the QC.
Is this paired end data or something ?
so if you do not have easy access to more memory, a hack you can do is to split up the function in subsets, like this:
# Only run internal part of QC from crash
finals_1 <- ORFik:::alignmentFeatureStatistics(df[1:5,], BPPARAM = BPPARAM)
finals_2 <- ORFik:::alignmentFeatureStatistics(df[6:10,], BPPARAM = BPPARAM)
finals_3 <- ORFik:::alignmentFeatureStatistics(df[11:15,], BPPARAM = BPPARAM)
finals <- data.table::rbindlist(list(finals_1, finals_2, finals_3))
# Do trimming detection
finals <- trim_detection(df, finals)
# Save file
stats_folder <- QCfolder(df)
write.csv(finals, file = file.path(stats_folder, "STATS.csv"))
# Get plots
QCplots(df, "mrna", stats_folder, plot.ext = ".pdf",
complex.correlation.plots = FALSE,
BPPARAM = BiocParallel::SerialParam()) # I think this might crash, but at least you will get the tables
You will now at least have the stats for the qualities and you can continue to p-shifting part.
What you will lose most likely is the track metaprofiles over genes etc (from QCplots): If you really need some of them, then as a validation run:
# Run for a representative subset
QCplots(df[c(1,6,11), "mrna", stats_folder, plot.ext = ".pdf",
complex.correlation.plots = FALSE,
BPPARAM = BiocParallel::SerialParam())
ORFik is optimized for speed on high end servers, sadly that makes it very memory hungry on laptops etc. Let me know how it works out :)
It worked! Thank you very much for your help and patience :)
Yes, they are pair-end libraries, but I didn't think 32Gb would not be able to survive them. I was obviously wrong haha.
Thank you again!
Good, if you have any more questions further down the road, reopen this issue or create a new one, depending on what makes sense for you.
Hi!
I'm interested in analyzing some RNA/Ribo-seq data using your package, but I'm encountering a problem with the ORFikQC function. This is the output when I run the function:
As the final QC output is not creqted, I can't check countables, for example.
Any idea what could be causing this? I'm working on Ubuntu 22.04, R 4.3.2 .
Thank you in advance for your attention :)