AlexandrovLab / SigProfilerExtractor

SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
BSD 2-Clause "Simplified" License
153 stars 51 forks source link

"Cannot reindex on an axis with duplicate labels"-Error #241

Closed MolPath-Bioinfo closed 7 months ago

MolPath-Bioinfo commented 7 months ago

Hello,

While running the SigProfilerExtractor (version 1.1.21 and 1.1.23) I encounter the following error:

Time taken to collect 500 iterations for 1 signatures is 308.93 seconds
Optimization time is 2.2204625606536865 seconds
The reconstruction error is 0.0288, average process stability is 1.0 and 
the minimum process stability is 1.0 for 1 signatures

Traceback (most recent call last):
  File "/home/calbig/TSO500_v1_github/TSO500_v1/sigProfilerExtractor_TSO500.py", line 18, in <module>
    run(input,workdir,minsig,maxsig,iteration,ref)
  File "/home/calbig/TSO500_v1_github/TSO500_v1/sigProfilerExtractor_TSO500.py", line 9, in run
    sig.sigProfilerExtractor("vcf", "results", input, reference_genome=ref, minimum_signatures=minsig, maximum_signatures=maxsig, nmf_replicates=iteration, cpu=2, nmf_init = 'random')
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/SigProfilerExtractor/sigpro.py", line 862, in sigProfilerExtractor
    decomp.spa_analyze(allgenomes, output, signatures=processAvg, genome_build=genome_build, cosmic_version=cosmic_version, exome=exome, verbose=False,
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/SigProfilerAssignment/decomposition.py", line 306, in spa_analyze
    genomes = sigPlot.process_input(genomes, m)
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/sigProfilerPlotting/sigProfilerPlotting.py", line 213, in process_input
    return order_input_context(plot_type, data)
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/sigProfilerPlotting/sigProfilerPlotting.py", line 207, in order_input_context
    reindexed_data = input_data.reindex(ref_format)
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/util/_decorators.py", line 347, in wrapper
    return func(*args, **kwargs)
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/core/frame.py", line 5205, in reindex
    return super().reindex(**kwargs)
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/core/generic.py", line 5289, in reindex
    return self._reindex_axes(
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/core/frame.py", line 5004, in _reindex_axes
    frame = frame._reindex_index(
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/core/frame.py", line 5023, in _reindex_index
    return self._reindex_with_indexers(
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/core/generic.py", line 5355, in _reindex_with_indexers
    new_data = new_data.reindex_indexer(
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 737, in reindex_indexer
    self.axes[axis]._validate_can_reindex(indexer)
  File "/data/anaconda3/envs/mutSig/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4316, in _validate_can_reindex
    raise ValueError("cannot reindex on an axis with duplicate labels")
ValueError: cannot reindex on an axis with duplicate labels

The error occurs if I'm using the SigProfilerAssignment in version v0.1.4 and version v0.1.3. If I install SigProfilerAssignment in version v0.0.30, then the tool is running, but it does not write out the cosmic signatures that compose the de novo extracted signature ("Global NMF Signatures"). This signatures are visible only in the pdf plot.

I'm using pandas version 1.5.3.

Do you have any suggestions on how to fix this?

Best regards Mihaela Thiele

mdbarnesUCSD commented 7 months ago

Hi @MolPath-Bioinfo,

I believe the issue was due to sigProfilerPlotting and was present in v.1.3.21. Could you please update to sigProfilerPlotting v1.3.22, which should now resolve this issue.

Please track progress on this issue via ticket #236.

Thanks!