non-coding with covariates message: re-running unfinished contigs... indefinitely?

asoltis / MutEnricher

Somatic coding and non-coding mutation enrichment analysis for tumor WGS data

Other

9 stars 3 forks source link

non-coding with covariates message: re-running unfinished contigs... indefinitely? #5

Open hbeale opened 2 years ago

hbeale commented 2 years ago

Hi - awesome program. I ran MutEnricher with docker without covariates successfully. Now I've added covariates, and it's been running for over a week without completing on 10 processors with 120GB of RAM. I'm scanning 18000 ~300 base regions in 75 samples. Can you suggest how I can tell whether it's in a loop or making progress? Thanks!

Messages from the last week:

re-running unfinished contigs: ['chr1', 'chr2']
  chr2 done.
  chr1 done.
  re-running unfinished contigs: ['chr1', 'chr2']

asoltis commented 2 years ago

Hello,

Thank you for using the tool. Either the code is getting stuck in a loop during the affinity propagation step because of non-convergence (perhaps trying self similarity parameters that don't converge and getting stuck) or an error is being thrown in the multiprocessing that is "locking" the process up instead of exiting. Does it keep printing out these same procedural commands, or is it just stuck on the current iteration? Are there any oddities in the covariates files for these chromosomes that may cause errors, i.e. do you see real valued similarities being produced in the temp files? The amount of data you are using should be able to finish in ~10-15 minutes.

If possible, you could share your covariates input files to help with debugging. You could also do a test run considering only these chromosomes and pay attention to the self similarity parameters being selected for the re-runs - if it keeps testing the same values, then there may be a bug in the restart settings (I have not run into such issues across many tests, but it is possible).

hbeale commented 2 years ago

Thanks for the reply. It kept printing the same commands. The data in the similarities file looks real to me, but I'm not sure I'd know what to look for. Here's the top of the chr1 similarities file:

1 2 -1.02
1 3 -0.0475
1 4 -0.387
1 5 -0.677
1 6 -1.15
1 7 -0.0399
1 8 -0.0746
1 9 -0.799
1 10 -0.618
1 11 -0.511

and the top of the chr20 similarities file:

1 2 -0.00185
1 3 -0.00517
1 4 -0.019
1 5 -0.00188
1 6 -0.00879
1 7 -0.0977
1 8 -0.039
1 9 -0.121
1 10 -0.029
1 11 -0.372

I'm not sure how I'd pay attention to the self similarity parameters being selected.

I re-ran the process skipping chr 1 and 2, and it ran without error.

Attached is the covariates file, which I created with get_region_covariates.py. hg19_gb_V40lift37_200bases_up_covariates.txt

Thanks again for your help!

asoltis commented 2 years ago

Thank you for the file - I can take a look at it and see if I can reproduce the errors and spot issues.

In the meantime, you can check the self similarity parameter used by looking at the summary.txt files in the apcluster_regions folder for the relevant chromosomes. The value is indicated by the "Preferences: " field, e.g.:

maxits=1000 convits=50 dampfact=0.9 number of data points: 2137 Preferences: -0.193000

the self-similarity parameter used here is -0.193. When a chromosome run does not converge, the code selects a new value close to, but different from, the prior value; it may be getting stuck picking the same value over and over. Note that you would have to monitor the progress and record the values manually each time the code says it is re-running (the file value are overwritten when a new iteration is started).

Something else you can try for the time being, and which may actually fix the issue and be overall simpler, is adjusting the affinity propagation iteration parameters (--ap-iters and/or --ap-convits). If you increase the total iterations and/or decrease the required convergence iterations it may help the algorithm converge as-is and complete.

asoltis commented 2 years ago

Update:

I was able to find a solution to the stalled chromosomes. It seems affinity propagation is getting stuck in oscillating cycles across multiple values of the self similarity parameter for chr1 and chr2. To combat this, AP has a damping factor that can be adjusted - in the code, this is fixed at 0.9, but bumping this up to 0.95 enabled convergence for all chromosomes (the overall run finished in ~15 minutes). Below I'm attaching the AP clustering outputs from this that you can use with your samples (i.e. with the precomputed cluster capabilities in the code):

hg19_gb_V40lift37_200bases_up_covariates_apcluster_regions.zip

Unfortunately, adjusting the damping factor is not a current code option, so you would have to adjust the actual python code to do so. I can add this as an option in a subsequent release. This testing also pointed me to a minor bug in the re-selection of the self similarity parameter that I can update as well. Hopefully the above results can address your immediate needs in the meantime.

hbeale commented 2 years ago

That's brilliant, thank you so much for the quick and thorough help!