Teichlab / cellphonedb

MIT License
342 stars 105 forks source link

Input format error?? #141

Closed terooatt closed 4 years ago

terooatt commented 4 years ago

Hi , Thank you very much for this package. After converting my anndata (scanpy) to the count file and meta file using your script @ https://www.cellphonedb.org/faq-and-troubleshooting I then run cellphonedb as shown below and got an error I can't understand the problem. Any help would be greatly appreciated.

Thanks

$ cellphonedb method statistical_analysis meta_8w_I_ES.txt count_8w_I_ES.txt /Users/tommy/cpdb-venv/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.cluster.kmeans module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API. warnings.warn(message, FutureWarning) [ ][APP][28/01/20-10:27:22][WARNING] Latest local available version is v2.0.0, using it [ ][APP][28/01/20-10:27:22][WARNING] User selected downloaded database v2.0.0 is available, using it [ ][CORE][28/01/20-10:27:22][INFO] Initializing SqlAlchemy CellPhoneDB Core [ ][CORE][28/01/20-10:27:22][INFO] Using custom database at /Users/tommy/.cpdb/releases/v2.0.0/cellphone.db [ ][APP][28/01/20-10:27:22][INFO] Launching Method cpdb_statistical_analysis_local_method_launcher [ ][APP][28/01/20-10:27:22][INFO] Launching Method _set_paths [ ][APP][28/01/20-10:27:22][INFO] Launching Method _load_meta_counts [ ][CORE][28/01/20-10:27:23][INFO] Launching Method cpdb_statistical_analysis_launcher [ ][CORE][28/01/20-10:27:23][INFO] Launching Method _counts_validations [ ][CORE][28/01/20-10:27:23][INFO] [Cluster Statistical Analysis Simple] Threshold:0.1 Iterations:1000 Debug-seed:-1 Threads:4 Precision:3 [ ][CORE][28/01/20-10:27:23][INFO] Running Simple Prefilters [ ][CORE][28/01/20-10:27:24][INFO] Running Real Simple Analysis [ ][APP][28/01/20-10:27:24][ERROR] Unexpected error Traceback (most recent call last): File "/Users/tommy/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/api_endpoints/terminal_api/method_terminal_api_endpoints/method_terminal_commands.py", line 144, in statistical_analysis subsampler, File "/Users/tommy/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/local_launchers/local_method_launcher.py", line 64, in cpdb_statistical_analysis_local_method_launcher subsampler File "/Users/tommy/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/method_launcher.py", line 75, in cpdb_statistical_analysis_launcher self.separator) File "/Users/tommy/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py", line 34, in call result_precision, File "/Users/tommy/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_simple_method.py", line 38, in call cluster_interactions = cpdb_statistical_analysis_helper.get_cluster_combinations(clusters['names']) File "/Users/tommy/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_helper.py", line 134, in get_cluster_combinations return sorted(itertools.product(cluster_names, repeat=2)) TypeError: '<' not supported between instances of 'float' and 'str'

mvento commented 4 years ago

Hi @terooatt,

Thanks for using CellPhoneDB. Can you send us the input data?

Best!

terooatt commented 4 years ago

I sent the inputs from my email. Let me know if you received them.

Thank you!

mvento commented 4 years ago

Hi @terooatt,

Sorry, I didn't receive it. Can you try it again? The email is on my GitHub profile.

Best

royfrancis commented 4 years ago

Hi, I get a similar error. My test dataset is attached. test-counts.txt test-meta.txt

The test data from your tutorial works though. So it must be something wrong with my input data. Although they both look similar to me.

Shiywa commented 4 years ago

hi, I get the same error. My data format looks same with your test data. the procedure always stop at Running Real Complex Analysis. When I tried to sample first 50 cells of my dataset to test CellPhoneDB, unexpectly, procedure didn't fail at Running Real Complex Analysis and successd. Is there any bugs in this procedure for big dataset ?

this error report is for a big dataset among 6000+ cells:

(cpdb-venv) bogon:data wangshiyou$ cellphonedb method statistical_analysis single_meta_data.txt single_count.txt --project-name=single_CNK --threshold=0.25 --threads=2
/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)
[ ][APP][01/04/20-15:09:32][WARNING] Latest local available version is `v2.0.0`, using it
[ ][APP][01/04/20-15:09:32][WARNING] User selected downloaded database `v2.0.0` is available, using it
[ ][CORE][01/04/20-15:09:32][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][01/04/20-15:09:32][INFO] Using custom database at /Users/wangshiyou/.cpdb/releases/v2.0.0/cellphone.db
[ ][APP][01/04/20-15:09:32][INFO] Launching Method cpdb_statistical_analysis_local_method_launcher
[ ][APP][01/04/20-15:09:32][INFO] Launching Method _set_paths
[ ][APP][01/04/20-15:09:32][INFO] Launching Method _load_meta_counts
[ ][CORE][01/04/20-15:10:05][INFO] Launching Method cpdb_statistical_analysis_launcher
[ ][CORE][01/04/20-15:10:05][INFO] Launching Method _counts_validations
[ ][CORE][01/04/20-15:10:08][INFO] [Cluster Statistical Analysis Simple] Threshold:0.25 Iterations:1000 Debug-seed:-1 Threads:2 Precision:3
[ ][CORE][01/04/20-15:10:08][INFO] Running Simple Prefilters
[ ][CORE][01/04/20-15:10:10][INFO] Running Real Simple Analysis
[ ][APP][01/04/20-15:10:11][ERROR] Unexpected error
Traceback (most recent call last):
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/api_endpoints/terminal_api/method_terminal_api_endpoints/method_terminal_commands.py", line 144, in statistical_analysis
    subsampler,
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/local_launchers/local_method_launcher.py", line 64, in cpdb_statistical_analysis_local_method_launcher
    subsampler
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/method_launcher.py", line 75, in cpdb_statistical_analysis_launcher
    self.separator)
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py", line 34, in call
    result_precision,
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_simple_method.py", line 50, in call
    counts_data=counts_data)
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_helper.py", line 186, in mean_analysis
    counts_data=counts_data)
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_helper.py", line 463, in cluster_interaction_mean
    mean_ligand = means_cluster_ligands[interaction['{}{}'.format(counts_data, suffixes[1])]]
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/pandas/core/series.py", line 767, in __getitem__
    result = self.index.get_value(self, key)
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3115, in get_value
    k = self._convert_scalar_indexer(k, kind='getitem')
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 1663, in _convert_scalar_indexer
    return self._invalid_indexer('label', key)
  File "/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 1863, in _invalid_indexer
    kind=type(key)))
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [nan] of <class 'float'>

this report is for 50 cells:

(cpdb-venv) bogon:data wangshiyou$ cellphonedb method statistical_analysis single_meta_data_50.txt single_count_50.txt --project-name=single_CNK --threshold=0.25 --threads=2
/Users/wangshiyou/cpdb-venv/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)
[ ][APP][01/04/20-15:16:26][WARNING] Latest local available version is `v2.0.0`, using it
[ ][APP][01/04/20-15:16:26][WARNING] User selected downloaded database `v2.0.0` is available, using it
[ ][CORE][01/04/20-15:16:26][INFO] Initializing SqlAlchemy CellPhoneDB Core
[ ][CORE][01/04/20-15:16:26][INFO] Using custom database at /Users/wangshiyou/.cpdb/releases/v2.0.0/cellphone.db
[ ][APP][01/04/20-15:16:26][INFO] Launching Method cpdb_statistical_analysis_local_method_launcher
[ ][APP][01/04/20-15:16:26][INFO] Launching Method _set_paths
[ ][APP][01/04/20-15:16:26][INFO] Launching Method _load_meta_counts
[ ][CORE][01/04/20-15:16:27][INFO] Launching Method cpdb_statistical_analysis_launcher
[ ][CORE][01/04/20-15:16:27][INFO] Launching Method _counts_validations
[ ][CORE][01/04/20-15:16:27][INFO] [Cluster Statistical Analysis Simple] Threshold:0.25 Iterations:1000 Debug-seed:-1 Threads:2 Precision:3
[ ][CORE][01/04/20-15:16:27][INFO] Running Simple Prefilters
[ ][CORE][01/04/20-15:16:27][INFO] Running Real Simple Analysis
[ ][CORE][01/04/20-15:16:30][INFO] Running Statistical Analysis
[ ][CORE][01/04/20-15:29:50][INFO] Building Pvalues result
[ ][CORE][01/04/20-15:30:09][INFO] Building Simple results
[ ][CORE][01/04/20-15:30:10][INFO] [Cluster Statistical Analysis Complex] Threshold:0.25 Iterations:1000 Debug-seed:-1 Threads:2 Precision:3
[ ][CORE][01/04/20-15:30:10][INFO] Running Complex Prefilters
[ ][CORE][01/04/20-15:30:13][INFO] Running Real Complex Analysis
[ ][CORE][01/04/20-15:30:17][INFO] Running Statistical Analysis
[ ][CORE][01/04/20-15:44:11][INFO] Building Pvalues result
[ ][CORE][01/04/20-15:44:36][INFO] Building Complex results
royfrancis commented 4 years ago

@ChickenWangSY I think the error discussed here may be different from yours. I was eventually able to fix mine. To start with, check that the column names of data and Cell in meta data match exactly. In my case, I had two cells in my ~6000 cell dataset that was causing issues. I still don't know what is wrong with them. But, removing them made it work. It took a long time to find them. I sequentially halved the dataset and reran over and over to eventually pin-point the problem samples.

Shiywa commented 4 years ago

I agree with your opinion. I am sure that column names of meta data and column names of count data matched exactly. the number of intesection of them was same with true cells.

> length(colnames(count)[-1])
[1] 6568
> length(intersect(colnames(count)[-1],metadata$Cell))
[1] 6568
> length(metadata$Cell)
[1] 6568

maybe there are some "bad" cells, which influenced procedure, in my datset too. thanks for your suggestion

royfrancis commented 4 years ago

Not just the length. But that IDs match exactly in the same order.

all.equal(colnames(count)[-1],metadata$Cell)

Shiywa commented 4 years ago

oh, I guess that I have found the really error. I found that there are NAs in my count matrix unexpectly. I used na.omit to remove 45 genes and re-run procedure. it is running without error now.

terooatt commented 4 years ago

I have been waiting a while for help on this but you guys figure it out, thanks! I will close this issue then.