AlexandrovLab / SigProfilerExtractor

SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
BSD 2-Clause "Simplified" License
154 stars 52 forks source link

AttributeError: 'PandasArray' object has no attribute '_str_len' #67

Closed Marozi2 closed 3 years ago

Marozi2 commented 3 years ago

Hello,

I try to run SigProfilerExtractor with this command: sig.sigProfilerExtractor("matrix", outputfile, inputcatalog, opportunity_genome="GRCh37", exome=True, minimum_signatures=1, maximum_signatures=7, cpu=24) The input catalog I give is a mutational catalog taken from ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl/mutational_catalogs/exomes/Breast/Breast_exomes_mutational_catalog_96_subs.txt The tool seems to correctly compute the 7 signatures and create all related files. But it fails after the extraction of the last signature with this error message:

Time taken to collect 500 iterations for 7 signatures is 1461.99 seconds
Optimization time is 3.104848623275757 seconds
The reconstruction error is 0.0411, average process stability is 0.91 and 
the minimum process stability is 0.63 for 7 signatures

Traceback (most recent call last):
  File "/home/u/u232133/Scripts/mutational_signatures/sigprofiler_denovo_bis.py", line 39, in <module>
    main(sys.argv[1:])
  File "/home/u/u232133/Scripts/mutational_signatures/sigprofiler_denovo_bis.py", line 35, in main
    sig.sigProfilerExtractor("matrix", outputfile, inputcatalog, opportunity_genome="GRCh37", exome=exome, minimum_signatures=1, maximum_signatures=7, cpu=24)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/SigProfilerExtractor/sigpro.py", line 997, in sigProfilerExtractor
    initial_remove_penalty=initial_remove_penalty, refit_denovo_signatures=refit_denovo_signatures, de_novo_fit_penalty=de_novo_fit_penalty, sequence=sequence)    
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/SigProfilerExtractor/subroutines.py", line 1989, in make_final_solution
    lognote.write("{}\n".format(exposures.iloc[:,exposures.to_numpy().nonzero()[1]])) 
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 687, in __repr__
    show_dimensions=show_dimensions,
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 820, in to_string
    return formatter.to_string(buf=buf, encoding=encoding)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/io/formats/format.py", line 914, in to_string
    return self.get_result(buf=buf, encoding=encoding)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/io/formats/format.py", line 521, in get_result
    self.write_result(buf=f)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/io/formats/format.py", line 833, in write_result
    max_len = Series(lines).str.len().max()
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/core/strings/accessor.py", line 2579, in len
    result = self._data.array._str_len()
AttributeError: 'PandasArray' object has no attribute '_str_len'
srun: error: chugen005: task 0: Exited with exit code 1

I looked into the code of subroutines.py and sigpro.py but was not able to find a potential error. My SBS96_De-Novo_refit_Signature_Assignment_log.txt file is filled with:

************************ Stepwise Description of Signature Assignment to Samples ************************

                    ################ Sample 1 #################
############################# Initial Composition ####################################

The 1989 line in subroutines.py is the one in this block:

else:

        # when refilt de_novo_signatures                                                                                                                                                               
        if refit_denovo_signatures==True:
            exposureAvg=denovo_exposureAvg
            for g in range(allgenomes.shape[1]):

                # Record information to lognote                                                                                                                                                        
                lognote = open(layer_directory+"/Solution_Stats/"+solution_prefix_refit+"_Signature_Assignment_log.txt", "a")
                lognote.write("\n\n\n\n\n                    ################ Sample "+str(g+1)+ " #################\n")

                lognote.write("############################# Initial Composition ####################################\n")
                exposures = pd.DataFrame(exposureAvg[:, g],  index=allsigids).T
                lognote.write("{}\n".format(exposures.iloc[:,exposures.to_numpy().nonzero()[1]]))

                #remove signatures                                                                                                                                                                     
                exposureAvg[:,g],L2dist,cosine_sim = ss.remove_all_single_signatures(processAvg, exposureAvg[:, g], allgenomes[:,g], metric="l2", \
                           solver = "nnls", cutoff=de_novo_fit_penalty, background_sigs= [], verbose=False)
                if verbose==True:
                    print("############################## Composition After Remove ############################### ")
                    print(pd.DataFrame(exposureAvg[:, g],  index=allsigids).T)
                    print("L2%: ", L2dist)
                lognote.write("############################## Composition After  Remove ###############################\n")
                exposures = pd.DataFrame(exposureAvg[:, g],  index=allsigids).T
                lognote.write("{}\n".format(exposures.iloc[:,exposures.to_numpy().nonzero()[1]]))
                lognote.write("L2 Error %: {}\nCosine Similarity: {}\n".format(round(L2dist,2), round(cosine_sim,2)))
                lognote.close()

        # when use the exposures from the initial NMF

It seems to be the block when cosmic_sigs==False but in sigpro.py, the line calling make_final_solution gives cosmic_sigs=True so I'm a bit lost. Could you help me to find my error please?

Thanks

mishugeb commented 3 years ago

public_test.zip Hello, I was not able to reproduce the error on my side. Let's try that on your side. Please install the latest version of SigProfilerExtractor in your system. Then download the zip file, extract that and navigate to the folder. Then run the python file inside the folder. Please let me know what happens on your side.

Thanks, Mishu

Marozi2 commented 3 years ago

Hello, Thank you for your response. I tried what you ask and I got this error message:

Traceback (most recent call last):
  File "/home/u/u232133/public_test/sigpro.py", line 28, in <module>
    run()
  File "/home/u/u232133/public_test/sigpro.py", line 22, in run
    nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty, make_decomposition_plots=True, get_all_signature_matrices=False, cpu=-1)
TypeError: sigProfilerExtractor() got an unexpected keyword argument 'cosmic_version'

Also, here are versions I use:

-------Python and Package Versions------- 
Python Version: 3.7.6
Sigproextractor Version: 1.1.0
SigprofilerPlotting Version: 1.1.11
SigprofilerMatrixGenerator Version: 1.1.26
Pandas version: 1.0.4
Numpy version: 1.19.2
Scipy version: 1.4.1
Scikit-learn version: 0.23.1

Thank you

mishugeb commented 3 years ago

Can you please remove the 'cosmic_version' argument in the python file and test it again?

Marozi2 commented 3 years ago

Of course, I got the same error message than before:

Time taken to collect 10 iterations for 3 signatures is 20.56 seconds
Optimization time is 3.9050021171569824 seconds
The reconstruction error is 0.5113, average process stability is 0.96 and 
the minimum process stability is 0.9 for 3 signatures

Traceback (most recent call last):
  File "/home/u/u232133/public_test/sigpro.py", line 28, in <module>
    run()
  File "/home/u/u232133/public_test/sigpro.py", line 22, in run
    nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty, make_decomposition_plots=True, get_all_signature_matrices=False, cpu=-1)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/SigProfilerExtractor/sigpro.py", line 997, in sigProfilerExtractor
    initial_remove_penalty=initial_remove_penalty, refit_denovo_signatures=refit_denovo_signatures, de_novo_fit_penalty=de_novo_fit_penalty, sequence=sequence)    
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/SigProfilerExtractor/subroutines.py", line 1989, in make_final_solution
    lognote.write("{}\n".format(exposures.iloc[:,exposures.to_numpy().nonzero()[1]])) 
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 687, in __repr__
    show_dimensions=show_dimensions,
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 820, in to_string
    return formatter.to_string(buf=buf, encoding=encoding)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/io/formats/format.py", line 914, in to_string
    return self.get_result(buf=buf, encoding=encoding)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/io/formats/format.py", line 521, in get_result
    self.write_result(buf=f)
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/io/formats/format.py", line 833, in write_result
    max_len = Series(lines).str.len().max()
  File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/core/strings/accessor.py", line 2579, in len
    result = self._data.array._str_len()
AttributeError: 'PandasArray' object has no attribute '_str_len'
mishugeb commented 3 years ago

I can't reproduce the issue. Could you please download and install the package from the git repository and see if it helps using the following commands: """ $git clone https://github.com/AlexandrovLab/SigProfilerExtractor.git $ cd SigProfilerExtractor $pip install . """

Marozi2 commented 3 years ago

This is working in local on my computer but not remotely on a HPC. I'm going to do a container with sigprofiler for the HPC because I'm not able to identify the problem on the HPC. Thank very much you for your help. Would it be possible for you to take a look at my issue on SigProfilerSingleSample?

niall0 commented 3 years ago

I had the same error message when using the .describe() method. I fixed the same problem with the following:

pd.set_option("display.max_columns", None)

Possibly something broke with the latest version of pandas whereby the '_str_len' attribute seems to no longer exist. Before I discovered this fix, I also solved the problem by editing line 844 in the Anaconda3\Lib\site-packages\pandas\io\formats\format.py file (in Windows) from:

max_len = Series(lines).str.len().max()

to:

max_len = len(str(Series(lines).values.max()))

which avoids use of the .len() method, which is the cause of the error being thrown. However I am relatively new to python coding and perhaps editing a package file in this way is considered bad practice!

I am running python 3.8 with anaconda, my version of pandas is 1.2.3.

niall0 commented 3 years ago

Hi @Marozi2

Reading the error message, I see the same line of code appearing that caused me problems:

File "/home/u/u232133/miniconda3/lib/python3.7/site-packages/pandas/io/formats/format.py", line 833, in write_result max_len = Series(lines).str.len().max()

Try replacing this with

max_len = len(str(Series(lines).values.max()))

This seems to have worked for others who are having the same problem.

Marozi2 commented 3 years ago

Hi @niall0

A big thank you for the solutions you mentioned. I also think it is a bad practice to edit a package file of python so I prefer to avoid that. The weird thing is that I had this issue only when running the tool on a HPC but not on my computer with same config. Eventually, I've made a docker container with SigProfilerExtractor for the HPC and it works. Thanks again!