mhagemann86 commented 5 years ago

Hi,

I have a small question

I am running degnorm on 60 samples. Unfortunately my server crashed after 56 of them. It has been running for many days. It was still in the process of doing the chrom coverage/ overlap coverage/read counts/ for each individual sample. Is there any way to get degnorm to finish the last remaining samples and then create the comparison or degradation matrix of all 60 samples?

Best, Michael

ffineis commented 5 years ago

Not very easily, as the coverage matrices and read counts within your first run (I'm guessing) still haven't been merged together, because coverage/read counting has not completed for all 60. Here are some options:

Option 1 (worst, by far):

You run degnorm for the remaining 4 samples
For the original run: post the output of $ tree so I can get a sense of what exactly has completed and what has not. You should have 56 directories, one for each sample, and within those a subdirectory for each chromosome containing an .npz file for coverage and a .csv file for read counts. I can write a script that will join the coverage data together as well as the read counts across the 56 completed samples.
Joining the two separate runs: I'd need to write a second script to pd.merge the read counts across the runs, pd.merge the gene/exon data across the runs, and then for each gene hstack the two coverage matrices (there being one from each of the separate runs) and save those joined coverage matrices. In total, this simulates the effect of having run degnorm successfully on all 60 samples.
Getting DI scores: run degnorm with a warm-start directory from the results from 3.

Option 2 (best): If you're working on an HPC, I highly recommend using degnorm_mpi - computing time decreases by a factor of about 1 / number-of-servers.

Option 3 (not bad): I could make changes to the code so that for any given degnorm pipeline run, instead of just immediately computing coverage and reads, search for the required coverage and read count files ahead of time in the output directory. If they exist, skip coverage/read counting and move on. You would specify the output directory as the failed one, and degnorm could detect the coverage/read count files have already been created.

The only caveat to Option 3 is that there's no way to tell ahead of time if the coverage/read files found are valid - like a user could move coverage/reads from a wholly different degnorm run that are not compatible with other samples (e.g. coverage matrices are the wrong shape or something), and then DegNorm would have really unpredictable behavior.

ffineis commented 5 years ago

I've gone with option 3. Check out branch hotfix/issues and degnorm should be able to re-use coverage and read count files from a previously killed/failed run. Will close this issue out if I don't hear back in a few days.

ffineis commented 4 years ago

It should be in the docs, this is correct. Please see my response to issue

41. As noted on the docs, community contributions are welcome. I'll need

some time to make required updates.

Frank

On Mon, May 18, 2020 at 2:54 AM laloverdin notifications@github.com wrote:

@ffineis https://github.com/ffineis Hello, i had a similar issue and now solved it by following the mentioned option 3. Neverheless, I would recommend adding more precisely this option in the page " https://nustatbioinfo.github.io/DegNorm/", since it is mentioned in the release notes but not in the "Running DegNorm" section; furthermore, even in the release notes it is not mentioned the precise command for doing this function. Best regards,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NUStatBioinfo/DegNorm/issues/30#issuecomment-630010245, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZIXCLKGNXYVFU3UG7IIS3RSDSTHANCNFSM4H3A6QPA .

NUStatBioinfo / DegNorm

Is there anyway of continuing without restarting? #30

41. As noted on the docs, community contributions are welcome. I'll need