Teichlab / cellphonedb

MIT License
339 stars 105 forks source link

InvalidIndexError upon building results after successful statistical analysis #280

Closed ColeKeenum closed 3 years ago

ColeKeenum commented 3 years ago

Hello. Thank you guys for making CellPhoneDB, this is a great resource. Using the Python package, I am getting this error with inputs from scRNA-seq data that I have collected for multiple treatments. I can run statistical analysis for the first 50 cells as a test, but when I try to run on all 7257 cells in this dataset, I get the following error message:

(cpdb-venv) osboxes@osboxes:/media/sf_citeseq$ cellphonedb method statistical_analysis mp4_meta.txt mp4_count.txt --project-name=mp4 /home/osboxes/cpdb-venv/lib/python3.8/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.cluster.kmeans module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API. warnings.warn(message, FutureWarning) [ ][APP][04/03/21-21:52:56][WARNING] Latest local available version is v2.0.0, using it [ ][APP][04/03/21-21:52:56][WARNING] User selected downloaded database v2.0.0 is available, using it [ ][CORE][04/03/21-21:52:56][INFO] Initializing SqlAlchemy CellPhoneDB Core [ ][CORE][04/03/21-21:52:56][INFO] Using custom database at /home/osboxes/.cpdb/releases/v2.0.0/cellphone.db [ ][APP][04/03/21-21:52:56][INFO] Launching Method cpdb_statistical_analysis_local_method_launcher [ ][APP][04/03/21-21:52:56][INFO] Launching Method _set_paths [ ][APP][04/03/21-21:52:56][INFO] Launching Method _load_meta_counts [ ][CORE][04/03/21-21:54:46][INFO] Launching Method cpdb_statistical_analysis_launcher [ ][CORE][04/03/21-21:54:46][INFO] Launching Method _counts_validations [ ][CORE][04/03/21-21:54:55][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:-1 Threads:4 Precision:3 [ ][CORE][04/03/21-21:55:12][INFO] Running Real Analysis [ ][CORE][04/03/21-21:57:11][INFO] Running Statistical Analysis

[ ][CORE][05/03/21-03:29:28][INFO] Building Pvalues result [ ][CORE][05/03/21-03:37:14][INFO] Building results [ ][APP][05/03/21-03:37:34][ERROR] Unexpected error Traceback (most recent call last): File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/cellphonedb/src/api_endpoints/terminal_api/method_terminal_api_endpoints/method_terminal_commands.py", line 127, in statistical_analysis LocalMethodLauncher(cpdb_app.create_app(verbose, database)). \ File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/cellphonedb/src/local_launchers/local_method_launcher.py", line 54, in cpdb_statistical_analysis_local_method_launcher self.cellphonedb_app.method.cpdb_statistical_analysis_launcher( File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/cellphonedb/src/core/methods/method_launcher.py", line 63, in cpdb_statistical_analysis_launcher cpdb_statistical_analysis_method.call(meta, File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py", line 23, in call cpdb_statistical_analysis_complex_method.call(meta.copy(), File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 107, in call pvalues_result, means_result, significant_means, deconvoluted_result = build_results( File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 201, in build_results pvalues_result = pd.concat([interactions_data_result, result_percent], axis=1, join='inner', sort=False) File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 298, in concat return op.get_result() File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 516, in get_result indexers[ax] = obj_labels.get_indexer(new_labels) File "/home/osboxes/cpdb-venv/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3171, in get_indexer raise InvalidIndexError( pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Does anyone have a suggestion for how to debug this?

ColeKeenum commented 3 years ago

I saw that pandas updated on 3/2/21. I created my virtual environment and installed all dependencies on 3/3/21, so I thought that by using the previous version of pandas, I could fix the error. This also failed.

I tried to use only the first half of the genes (7881) in my dataset with the same cell identities and names, but this effort generated the same error.

ColeKeenum commented 3 years ago

I attempted to do a manual form of binary search to see if there was a region of my matrix that was causing the error. I partitioned the matrix into vertical halves (each row being a unique gene identity) with the same column names (cell IDs) kept constant. I found that the first half of my matrix was causing the error, but the second half worked fine with the statistical analysis method.

By repeating this method and narrowing my search, I thought that the error was being caused in the first 1/8th of the matrix. However, when I ran the next iteration with the first 16th and second 16th group of rows in the matrix, neither raised the InvalidIndexError exception.

I was able to run a different RNA count matrix from a similar 10X experiment and I had no issues at all. The main difference between this working matrix and the matrix which produced my initial error is that there were less cells in the working matrix (2926) than in the erroneous matrix (7257).

Still not sure what is going on here.

ColeKeenum commented 3 years ago

I manually installed cellphonedb with the pandas and numpy specifications in #281 but I got the same error.

moutazhelal commented 3 years ago

Hi @ColeKeenum thank you for raising this issue. I also recently installed Cellphonedb and ran but I have received the same error you mentioned. did you manage to solve the issue?

ColeKeenum commented 3 years ago

@moutazhelal I haven't figured it out yet. Going to try and debug with a sample of my matrix.

jxshi commented 3 years ago

Hi,

I have encountered the same issue here with cellphoneDB.

[ ][CORE][09/03/21-00:09:59][INFO] Running Statistical Analysis
[ ][CORE][09/03/21-00:21:39][INFO] Building Pvalues result
[ ][CORE][09/03/21-00:21:56][INFO] Building results
[ ][APP][09/03/21-00:21:56][ERROR] Unexpected error
Traceback (most recent call last):
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/api_endpoints/terminal_api/method_terminal_api_endpoints/method_terminal_commands.py", line 144, in statistical_analysis
    subsampler,
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/local_launchers/local_method_launcher.py", line 64, in cpdb_statistical_analysis_local_method_launcher
    subsampler
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/core/methods/method_launcher.py", line 76, in cpdb_statistical_analysis_launcher
    self.separator)
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py", line 36, in call
    result_precision,
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 119, in call
    counts_data
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 201, in build_results
    pvalues_result = pd.concat([interactions_data_result, result_percent], axis=1, join='inner', sort=False)
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 298, in concat
    return op.get_result()
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 516, in get_result
    indexers[ax] = obj_labels.get_indexer(new_labels)
  File "/home/dell/miniconda3/envs/cpdb/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3172, in get_indexer
    "Reindexing only valid with uniquely valued Index objects"
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

This is the command line that I used to run cellphoneDB.

cellphonedb method statistical_analysis --counts-data hgnc_symbol --result-precision 2 --subsampling  --subsampling-log True --subsampling-num-cells 500 --threads 4 meta.txt count.txt

Thank you! Best, Jianxiang

prete commented 3 years ago

Hi @ColeKeenum, @moutazhelal and @jxshi I'm pretty sure this error is related to the pandas version you're using. Could you confirm that's higher than 1.1.4? if so, then please try pip install -U pandas==1.1.4 and re run your cellphonedb command?

I'll fix the requirement in the package soon and build a new release with that fixed.

jxshi commented 3 years ago

Hi @ColeKeenum, @moutazhelal and @jxshi I'm pretty sure this error is related to the pandas version you're using. Could you confirm that's higher than 1.1.4? if so, then please try pip install -U pandas==1.1.4 and re run your cellphonedb command?

I'll fix the requirement in the package soon and build a new release with that fixed.

Great! It worked!

Thank you for your timely reply! Best, Jianxiang

ColeKeenum commented 3 years ago

Yup, was running pandas 1.2.2. Thank you @prete for your suggestion! The full statistical analysis executed without a problem.

Thank you to @jxshi and @moutazhelal for telling me that I wasn't the only one with this issue!

prete commented 3 years ago

fixed in #284