Roleren / ORFik

MIT License
32 stars 9 forks source link

ORFikQC error #155

Closed JGVCruz closed 7 months ago

JGVCruz commented 7 months ago

Hi!

I'm interested in analyzing some RNA/Ribo-seq data using your package, but I'm encountering a problem with the ORFikQC function. This is the output when I run the function:

> ORFikQC(df.rfp,BPPARAM = 8)
Started ORFik QC report for experiment: Babaian_COAD_Ribo-seq
Saving, ofst files to: /PATH/Bio_data/processed_data/Ribo-seq/Babaian_COAD/aligned/ofst/
Outputting libraries from: Babaian_COAD_Ribo-seq
15 :  RFP_WT3_r2

1 :  RFP_HET2_r1
2 :  RFP_HET2_r2

11 :  RFP_WT1_r2
12 :  RFP_WT2_r1

3 :  RFP_HET2_r3
4 :  RFP_KO1_r1

5 :  RFP_KO1_r2
6 :  RFP_KO1_r3

7 :  RFP_KO3_r1
8 :  RFP_KO3_r2

9 :  RFP_KO3_r3
10 :  RFP_WT1_r1

13 :  RFP_WT2_r2
14 :  RFP_WT3_r1

--------------------------
Converting libraries to new format: ofst
RFP_HET2_r1
RFP_HET2_r2
RFP_HET2_r3
RFP_KO1_r1
RFP_KO1_r2
RFP_KO1_r3
RFP_KO3_r1
RFP_KO3_r2
RFP_KO3_r3
RFP_WT1_r1
RFP_WT1_r2
RFP_WT2_r1
RFP_WT2_r2
RFP_WT3_r1
RFP_WT3_r2
--------------------------
- Creating read length tables:
Using previously stored readlengths in QC_STATS folder!
Outputting libraries from: Babaian_COAD_Ribo-seq
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘bplapply’ for signature ‘"integer", "numeric"’
In addition: Warning message:
In dir.create(stats_folder, recursive = TRUE) :
  '/PATH/Bio_data/processed_data/Ribo-seq/Babaian_COAD/aligned/QC_STATS' already exists

As the final QC output is not creqted, I can't check countables, for example.

Any idea what could be causing this? I'm working on Ubuntu 22.04, R 4.3.2 .

Thank you in advance for your attention :)

Roleren commented 7 months ago

First to try is to run with serial threading, so use ORFikQC(..., BPPARAM = BiocParallel::serialParam())

It looks like you specified threading to a numeric, it needs to be a class instance of serial or multi threading (a bit complicated I know). For 8 threads normally use BiocParallel::multicoreParam(workers=8)) if I remember correctly

JGVCruz commented 7 months ago

Hi! Thank you so much for answering! So, it worked for the Ribo-seq data, but the RNA is still throwing errors. I think is due to memory constraints (I "only" have 32 Gb available). I tried running it directly on command line, to see if the lack of visual interface would help a little, but it's still not working. Is there a way to reduce memory usage in this step?

Roleren commented 7 months ago

First idea is to run in serial mode, multi threading explodes memory usage. If you already are using that, the problem is that you can't load all data in (memory restriction by hardware). If so run the function convert_bam_to_ofst on your rnaseq and then rerun ORFikQC(..., create.ofst = FALSE, BPPARAM = ...)

That will then skip the ofst step (because you manually did it) which is what crashes your run I think. Let me know how it works out.

JGVCruz commented 7 months ago

I tried both, but it still does not work. Does any of the internal functions deal with multi-threading by themselves? I noticed that just before the spike in RAM usage this message appears:

fstcore package v0.9.18 (OpenMP detected, using 12 threads)

Any idea what could that be? Thanks again for the help :)

Roleren commented 7 months ago

So this function worked (convert_bam_to_ofst)?

If so can you run:

outputLibs(df[1,], type ="ofst") #outout only first
outputLibs(df, type ="ofst") # all
Roleren commented 7 months ago

fstcore is the package that does the final saving to disc of the bam file to ofst format conversion, don't think that is what crashes

JGVCruz commented 7 months ago

Hi! Sorry for the delay.

Yes, the convert_bam_to_ofst works, and so does the other function you asked, be it with only one object or all of them.

Roleren commented 7 months ago

Ok, I think it might crash on the plotting step, this is very memory intensive, to validate that, please run this:

# First close your web browser and any memory using app
# Load your exp and library(ORFik) here
debug(ORFikQC)
ORFikQC(df.rfp, create.ofst = FALSE, BPPARAM = BiocParallel::SerialParam())

So, to debug, use the rstudio console, commands are: n + enter (execute next command) s + enter (step into function) f + enter (complete remaining part of given function) Let me know exactly where it fails and the error you see :)

JGVCruz commented 7 months ago

So, everything works until this function:

finals <- alignmentFeatureStatistics(df, BPPARAM = BPPARAM)

It runs until the third of the 6 libraries and eats up all memory, and R just crashes completely :/

Roleren commented 7 months ago

Ok, so this is not even the most memory hungry part of the QC.

Is this paired end data or something ?

so if you do not have easy access to more memory, a hack you can do is to split up the function in subsets, like this:

# Only run internal part of QC from crash
finals_1 <- ORFik:::alignmentFeatureStatistics(df[1:5,], BPPARAM = BPPARAM)
finals_2 <- ORFik:::alignmentFeatureStatistics(df[6:10,], BPPARAM = BPPARAM)
finals_3 <- ORFik:::alignmentFeatureStatistics(df[11:15,], BPPARAM = BPPARAM)
finals <- data.table::rbindlist(list(finals_1, finals_2, finals_3))
# Do trimming detection
finals <- trim_detection(df, finals)
# Save file
stats_folder <- QCfolder(df)
write.csv(finals, file = file.path(stats_folder, "STATS.csv"))
# Get plots
QCplots(df, "mrna", stats_folder, plot.ext = ".pdf",
        complex.correlation.plots = FALSE,
        BPPARAM = BiocParallel::SerialParam()) # I think this might crash, but at least you will get the tables 

You will now at least have the stats for the qualities and you can continue to p-shifting part.

What you will lose most likely is the track metaprofiles over genes etc (from QCplots): If you really need some of them, then as a validation run:

# Run for a representative subset
QCplots(df[c(1,6,11), "mrna", stats_folder, plot.ext = ".pdf",
        complex.correlation.plots = FALSE,
        BPPARAM = BiocParallel::SerialParam())

ORFik is optimized for speed on high end servers, sadly that makes it very memory hungry on laptops etc. Let me know how it works out :)

JGVCruz commented 7 months ago

It worked! Thank you very much for your help and patience :)

Yes, they are pair-end libraries, but I didn't think 32Gb would not be able to survive them. I was obviously wrong haha.

Thank you again!

Roleren commented 7 months ago

Good, if you have any more questions further down the road, reopen this issue or create a new one, depending on what makes sense for you.