borenstein-lab / MUSiCC

MUSiCC: A marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome
BSD 3-Clause "New" or "Revised" License
15 stars 5 forks source link

IndexError when running MUSiCC on Windows and Linux #1

Open zkstewart opened 7 years ago

zkstewart commented 7 years ago

Hi,

I was attempting to use MUSiCC to normalise read count data for a project, but continue to run into an error when using the '-perf' argument regardless of the OS I have Python installed on or whether I use my own data or the provided example data. Below are four examples of this error, the first two are from my home PC (Windows 10) using Anaconda3 (Python 3.6), the third is from running this on a high-performance computing environment running on SUSE using Anaconda3 (Python 3.6), and the last is from the same SUSE environment using Anaconda2 (Python 2.7).

Although I haven't shown the output below, I have also tried running these same scripts with my own tab-delimited gene counts file and receive the same error, so I do not believe the example data file is broken.

When I do not call the '-perf' argument, everything runs fine to completion. Thus, as the traceback shows, I believe that the numpy data structure is being referenced incorrectly when attempting to calculate the model performance (I do not understand numpy myself so do not know what the issue is from looking at the code).

This is using MUSiCC 1.0.2.

Thanks, Zac


python C:\abbreviated_dir\Anaconda3\Scripts\run_musicc.py D:\abbreviated_dir\simulated_ko_relative_abundance.tab -o D:\abbreviated_dir\musicc.test.tab -n -perf -v -c learn_model C:\abbreviated_dir\Anaconda3\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Running MUSiCC... Input: D:\abbreviated_dir\simulated_ko_relative_abundance.tab Output: D:\abbreviated_dir\musicc.test.tab Normalize: True Correct: learn_model Compute scores: True Loading data using pandas module... 20 samples and 3573 genes Done. Performing MUSiCC Correction... Learning sample-specific models ....................Done. Model performance on various gene sets: Traceback (most recent call last): File "C:\abbreviated_dir\Anaconda3\Scripts\run_musicc.py", line 26, in correct_and_normalize(vars(given_args)) File "C:\abbreviated_dir\Anaconda3\lib\site-packages\musicc\core.py", line 395, in correct_and_normalize print("Median R^2 across samples for all USCG:" + str(np.nanmedian(all_samples_mean_scores)[0])) IndexError: invalid index to scalar variable.


run_musicc.py C:\abbreviated_dir\Anaconda3\Scripts\run_musicc.py D:\abbreviated_dir\simulated_ko_relative_abundance.tab -o D:\abbreviated_dir\musicc.test.tab -n -c use_generic -perf -v C:\abbreviated_dir\Anaconda3\lib\site-packages\sklearn\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Running MUSiCC... Input: D:\abbreviated_dir\simulated_ko_relative_abundance.tab Output: D:\abbreviated_dir\musicc.test.tab Normalize: True Correct: use_generic Compute scores: True Loading data using pandas module... 20 samples and 3573 genes Done. Performing MUSiCC Correction... Generic model intercept:1.0 Generic model coefficients:[-0.00509 -0.00189 -0.00031 0.005 0.00126 0.00005 0.00001 0.00006 -0.00016 -0.00016 -0.00048 -0.00099 -0.00062 -0.00413 0.00038 0.00006 -0.00061 -0.00386 -0.00092 0.00002 0.00006 0.00126 0.00009 0.00006 0.00015 -0.00199 -0.00026 -0.00222 -0.01525 -0.04291 -0.01742 0.00447 -0. 0.00001 0.00688] Correcting samples using generic model ....................Done. Model performance on various gene sets: Traceback (most recent call last): File "C:\abbreviated_dir\Anaconda3\Scripts\run_musicc.py", line 26, in correct_and_normalize(vars(given_args)) File "C:\abbreviated_dir\Anaconda3\lib\site-packages\musicc\core.py", line 395, in correct_and_normalize print("Median R^2 across samples for all USCG:" + str(np.nanmedian(all_samples_mean_scores)[0])) IndexError: invalid index to scalar variable.


python /abbreviated_dir/anaconda3/bin/run_musicc.py /abbreviated_dir/simulated_ko_relative_abundance.tab -o /abbreviated_dir/musicc_norm.tab -n -perf -v -c learn_model /abbreviated_dir/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Running MUSiCC... Input: /abbreviated_dir/simulated_ko_relative_abundance.tab Output: /abbreviated_dir/musicc_norm.tab Normalize: True Correct: learn_model Compute scores: True Loading data using pandas module... 20 samples and 3573 genes Done. Performing MUSiCC Correction... Learning sample-specific models ....................Done. Model performance on various gene sets: Traceback (most recent call last): File "/abbreviated_dir/anaconda3/bin/run_musicc.py", line 26, in correct_and_normalize(vars(given_args)) File "/abbreviated_dir/anaconda3/lib/python3.6/site-packages/musicc/core.py", line 395, in correct_and_normalize print("Median R^2 across samples for all USCG:" + str(np.nanmedian(all_samples_mean_scores)[0])) IndexError: invalid index to scalar variable.


python2.7 /abbreviated_dir/anaconda2/bin/run_musicc.py /abbreviated_dir/simulated_ko_relative_abundance.tab -o /abbreviated_dir/musicc_norm.tab -n -perf -v -c learn_model /abbreviated_dir/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Running MUSiCC... Input: /abbreviated_dir/simulated_ko_relative_abundance.tab Output: /abbreviated_dir/musicc_norm.tab Normalize: True Correct: learn_model Compute scores: True Loading data using pandas module... 20 samples and 3573 genes Done. Performing MUSiCC Correction... Learning sample-specific models ....................Done. Model performance on various gene sets: Traceback (most recent call last): File "/abbreviated_dir/anaconda2/bin/run_musicc.py", line 26, in correct_and_normalize(vars(given_args)) File "/abbreviated_dir/anaconda2/lib/python2.7/site-packages/musicc/core.py", line 395, in correct_and_normalize print("Median R^2 across samples for all USCG:" + str(np.nanmedian(all_samples_mean_scores)[0])) IndexError: invalid index to scalar variable.

omanor commented 7 years ago

Thank you Zac, I will try to look into this. Also, I saw another email you sent but that didn't open a new issue. Can you open a new issue so I can answer you there?

jzrapp commented 4 years ago

Hi @zkstewart and @omanor,

I receive the same error. Was there an answer or solution to this? I don't really understand what the software is trying to tell me..

engal commented 4 years ago

Hi,

Thanks for notifying me that this was still an issue.

It looks like there was a bug when MUSiCC tried to print out performance metrics for learned models. I just released an update that should have addressed this issue.

Thanks for your interest in MUSiCC!