NUStatBioinfo / DegNorm

Normalizing RNA degradation in RNA-seq data
https://nustatbioinfo.github.io/DegNorm/
3 stars 1 forks source link

Error re-running a failed run #41

Open iS4i4S opened 4 years ago

iS4i4S commented 4 years ago

HI, I am trying to pick up a run that failed after calculating the coverage for all my samples but when i re-run the pipeline it creates a new output directory instead of detecting the previous one, maybe i am not inputing correctly the command.

First command (where it failed) degnorm --bam-dir /media/isaias.hernandez/seagate_ICM/FFPE_PCNSL_98n/STAR_bams -g /media/isaias.hernandez/seagate_ICM/FFPE_PCNSL_98n/FF_vs_FFPE/CNV/CASPER/degnorm/Homo_sapiens.GRCh38.97.gtf -o /media/isaias.hernandez/seagate_ICM/FFPE_PCNSL_98n/degnorm/degnorm_05122020_194420 -p 10

New command to pick up from previously:

degnorm --bam-dir /media/isaias.hernandez/seagate_ICM/FFPE_PCNSL_98n/STAR_bams -g /media/isaias.hernandez/seagate_ICM/FFPE_PCNSL_98n/FF_vs_FFPE/CNV/CASPER/degnorm/Homo_sapiens.GRCh38.97.gtf -o /media/isaias.hernandez/seagate_ICM/FFPE_PCNSL_98n/degnorm/degnorm_05122020_194420

Thanks in advance

ffineis commented 4 years ago

Hi Laloverdin,

Issue 1: You're right, your use of the --output dir should be picking up on the coverage files created in the prior degnorm run, but it's not. Is it just creating a new degnorm_MMDDYY_HMS directory within the output directory you've specified? If yes, this is a bug in utils.py. I can try to look into a fix this weekend.

Issue 2: Regarding the memory error - Degnorm was developed on studies using 6 - 15 RNASeq samples. 60 is a big job (> 60Gb just in raw reads I'm guessing). I would recommend trying the recently developed R package developed by Dr. Wang here: https://github.com/jipingw/DegNorm. It's supposed to be faster and more memory efficient. That, consider a bigger node, or use fewer samples.

Thanks, Frank

On Mon, May 18, 2020 at 4:20 AM laloverdin notifications@github.com wrote:

THe error of the first run is:

File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 561, in call return self.func(args, *kwargs) File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/site-packages/joblib/parallel.py", line 261, in call* for func, args, kwargs in self.items] File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/site-packages/joblib/parallel.py", line 261, in for func, args, kwargs in self.items] File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/site-packages/DegNorm-0.1.4-py3.6.egg/degnorm/reads_coverage_merge.py", line 331, in merge_chrom_coverage cov_mat = cov_mat.asfptype().todense() File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/site-packages/scipy/sparse/base.py", line 721, in todense return np.asmatrix(self.toarray(order=order, out=out)) File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 964, in toarray return self.tocoo(copy=False).toarray(order=order, out=out) File "/home/isaias.hernandez/anaconda2/envs/degnorm/lib/python3.6/site-packages/scipy/sparse/coo.py", line 252, in toarray

                                                                                                                                                                                                           return np.zeros(self.shape, dtype=self.dtype, order=order)

MemoryError

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NUStatBioinfo/DegNorm/issues/41#issuecomment-630056964, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZIXCJHX7KOONOHFSQ3B3TRSD4T7ANCNFSM4ND4JVAQ .

iS4i4S commented 4 years ago

@ffineis Thanks for the answer, I've already solved issue 1. Regarding to Issue2, I tried the R package and indeed it is faster but it throws me some error when loading some samples, sometimes it charges a samples and sometimes it does not causing the program to be killed before time (read_coverage_batch).

Thanks in advance