AlexandrovLab / SigProfilerClusters

Tool for analyzing the inter-mutational distances between SNV-SNV and INDEL-INDEL mutations. Tool separates mutations into clustered and non-clustered groups on a sample-dependent basis.
BSD 2-Clause "Simplified" License
11 stars 1 forks source link

Exome argument missing from the wiki documentation #16

Closed MikeACG closed 10 months ago

MikeACG commented 1 year ago

Hey there, I recently was attempting to run the program for some SigProfilerSimulator simulations I created providing a custom BED file in the exome parameter. When attempting to run SigProfilerClusters on those simulations, the program would complain that there were no simulations for the project. After digging around in the code, I found that the main function has an argument exome as well (default false) that is not described in the wiki documentation. From what I saw, this argument is only used to know if "_exome" should be appended to the path where the program expects to find the simulations from SigProfilerSimulator. Setting this to true in the call to the program made it work finally. I was wondering if this was just left out of the documentation by mistake or if actually SigProfilerClusters is currently not meant to be run for custom-range simulations. Another possibility is that the program is supposed to automatically detect that the simulation was custom-range but its not actually doing so currently.

MikeACG commented 1 year ago

UPDATE: well actually the program did not complain of missing simulations anymore but it encountered an error, here is the log:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/mike/.local/lib/python3.8/site-packages/SigProfilerClusters/hotspot.py", line 707, in calculateSampleIMDs
    regions = densityCorrection(densityMuts, densityMutsSim, windowSize)
  File "/home/mike/.local/lib/python3.8/site-packages/SigProfilerClusters/hotspot.py", line 544, in densityCorrection
    sims = random.sample(list(densityMutsSim.keys()), 10)
  File "/usr/lib/python3.8/random.py", line 363, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mike/.local/lib/python3.8/site-packages/SigProfilerClusters/SigProfilerClusters.py", line 669, in analysis
    regions, imds = hotspot.hotSpotAnalysis(project, genome, contexts, simContext, ref_dir, windowSize, processors, plotIMDfigure, exome, chromLengths, binsDensity, original, signature, percentage, firstRun, clustering_vaf, calculateIMD, chrom_based, correction)
  File "/home/mike/.local/lib/python3.8/site-packages/SigProfilerClusters/hotspot.py", line 1059, in hotSpotAnalysis
    r.get()
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: Sample larger than population or is negative

The call was: hp.analysis("BRCA_sigprofclust", "GRCh37", "96", ["288"], "../producedData/BRCA_sigprofclust/", correction=True, includedVAFs=False, exome=True)

Maybe the program is indeed untested for custom.range simulations and that is why the exome argument was omitted on purpose in the documentation?

MousumyCSE commented 1 year ago

Hi @MikeACG ,

Thanks for reaching out. Currently, we are working at this issue and will let you know the update soon.

Best, Mousumy

ebergstr commented 10 months ago

Hi MikeACG,

Apologies for the delay! Currently, the clusters tool does not support custom bed files; however, as long as the simulated files have the proper "exome" suffix included, in theory this will work.

The newest error that you reported had to do with running the cluster tool with exome=True and correction=True togeher. We have updated the tool to fix this issue (v1.1.2). I suggest upgrading your package and rerunning your current analysis. I will close this issue, but please reopen if you are still experiencing issues.

Best, Erik