getzlab / SignatureAnalyzer

Updated SignatureAnalyzer-GPU with mutational spectra & RNA expression compatibility.
MIT License
71 stars 21 forks source link

ValueError #34

Closed hsiaoyi0504 closed 8 months ago

hsiaoyi0504 commented 2 years ago
(base) [yihsiao@pathlab-ap-ps3a CCRCC_glycoproteomics_analysis]$ signatureanalyzer -n 10 \
>                   -t matrix \
>                   --objective gaussian \
>                   --max_iter 30000 \
>                   --prior_on_H L1 \
>                   --prior_on_W L1 \
> ./result/glyco_signature_analyzer.tsv
---------------------------------------------------------
---------- S I G N A T U R E  A N A L Y Z E R  ----------
---------------------------------------------------------
   * Negative values detecting, splitting vars m=4735 --> m=9470
   * Saving ARD-NMF outputs to ./nmf_output.h5
   * Running ARD-NMF...
        0/9: nit=    6 K=0 70   del=0.00000000
Traceback (most recent call last):
  File "/home/yihsiao/miniconda3/bin/signatureanalyzer", line 33, in <module>
    sys.exit(load_entry_point('signatureanalyzer', 'console_scripts', 'signatureanalyzer')())
  File "/home/yihsiao/getzlab-SignatureAnalyzer/signatureanalyzer/__main__.py", line 205, in main
    run_matrix(
  File "/home/yihsiao/getzlab-SignatureAnalyzer/signatureanalyzer/signatureanalyzer.py", line 362, in run_matrix
    res = ardnmf(
  File "/home/yihsiao/getzlab-SignatureAnalyzer/signatureanalyzer/bnmf.py", line 117, in ardnmf
    W,H = select_signatures(W,H)
  File "/home/yihsiao/getzlab-SignatureAnalyzer/signatureanalyzer/utils.py", line 151, in select_signatures
    H_max_id = H.idxmax(axis=1, skipna=True).astype('int')
  File "/home/yihsiao/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 9070, in idxmax
    res = self._reduce(
  File "/home/yihsiao/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 8850, in _reduce
    res, indexer = df._mgr.reduce(blk_func, ignore_failures=ignore_failures)
  File "/home/yihsiao/miniconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 354, in reduce
    nbs = blk.reduce(func, ignore_failures)
  File "/home/yihsiao/miniconda3/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 388, in reduce
    result = func(self.values)
  File "/home/yihsiao/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 8822, in blk_func
    return op(values, axis=1, skipna=skipna, **kwds)
  File "/home/yihsiao/miniconda3/lib/python3.9/site-packages/pandas/core/nanops.py", line 71, in _f
    return f(*args, **kwargs)
  File "/home/yihsiao/miniconda3/lib/python3.9/site-packages/pandas/core/nanops.py", line 1027, in nanargmax
    result = values.argmax(axis)
ValueError: attempt to get argmax of an empty sequence
Closing remaining open files:./nmf_output.h5...done
shankara-a commented 2 years ago

Hi @hsiaoyi0504,

It's a little difficult to debug without seeing the input file. It seems like the algorithm is calling K=0 meaning no latent factor passes the active_thresh during factorization. You could tinker with that threshold by lowering --active_thresh, but this normally means something is off about the input file. Additionally - try setting the input K0 to say ~50? We generally set this to at least 2x the expected number of "signatures."

Other questions: does the input matrix contain continuous data and is zero-centered? How many samples do you have (I see you have 4735 input features)?

hsiaoyi0504 commented 2 years ago

@shankara-a I shared the data with the email. I will give it a try of the suggestions you told.

The input matrix is continuous and median-centered. There are 103 samples.

tejas-j commented 2 years ago

Sorry to open up this old thread, but I'm encountering the exact same issue. I've tried running using the following code : signatureanalyzer -n 10 -t matrix --objective gaussian --max_iter 30000 --prior_on_H L1 --prior_on_W L1 matrix.tsv

I've tried with the following input matrices - top 10% of the most variable genes from DESeq2 normalized matrix, and log2 transformation of 1 and I end up having the same error. My data is 1847 genes x 61 samples.

Thanks