Teichlab / cellphonedb

MIT License
339 stars 105 forks source link

ValueError: Plan shapes are not aligned #260

Closed jmrussell closed 3 years ago

jmrussell commented 3 years ago

When running an analysis using a custom DB, I get the following traceback during "Building Results"

Traceback (most recent call last):
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/api_endpoints/terminal_api/method_terminal_api_endpoints/method_terminal_commands.py", line 144, in statistical_analysis
    subsampler,
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/local_launchers/local_method_launcher.py", line 64, in cpdb_statistical_analysis_local_method_launcher
    subsampler
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/core/methods/method_launcher.py", line 75, in cpdb_statistical_analysis_launcher
    self.separator)
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_method.py", line 36, in call
    result_precision,
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 118, in call
    counts_data
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 215, in build_results
    counts_data)
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 248, in deconvoluted_complex_result_build
    [deconvoluted_simple_result_1, deconvoluted_complex_result_2, deconvoluted_simple_result_2], sort=False)
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/pandas/core/frame.py", line 7138, in append
    sort=sort,
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 258, in concat
    return op.get_result()
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 473, in get_result
    mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 2038, in concatenate_block_managers
    for placement, join_units in concat_plan:
  File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/pandas/core/internals/concat.py", line 474, in combine_concat_plans
    raise ValueError("Plan shapes are not aligned")
ValueError: Plan shapes are not aligned

The db generation process itself has zero errors and appears to be working properly. Any ideas?

jmrussell commented 3 years ago

I fixed this by removing duplicate column names in

 File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 248, in deconvoluted_complex_result_build
    [deconvoluted_simple_result_1, deconvoluted_complex_result_2, deconvoluted_simple_result_2], sort=False)
martin-jeremy commented 3 years ago

Hi @jmrussell , Sorry to update this topic but I have the same issue with a custom DB. I don't understand what do you mean by "removing duplicate column names". Can you explain it (if you remember it, of course) ?

maheetha commented 3 years ago

Hi @jmrussell Can you also please clarify? I am getting the same error, but I made sure that there were no duplicate columns in my counts.txt data. I don't know exactly what to change, in the code.

Currently lines 247-248 says : deconvoluted_result = deconvoluted_complex_result_1.append( [deconvoluted_simple_result_1, deconvoluted_complex_result_2, deconvoluted_simple_result_2], sort=False)

With this, I'm still getting the error message. I'm not sure what you mean by "remove duplicate column names". Any help would be appreciated.

Thank you!

jmrussell commented 3 years ago
def deconvoluted_complex_result_build(clusters_means: pd.DataFrame,
                                      interactions: pd.DataFrame,
                                      complex_compositions: pd.DataFrame,
                                      counts: pd.DataFrame,
                                      genes: pd.DataFrame,
                                      counts_data: str) -> pd.DataFrame:
    genes_counts = list(counts.index)
    genes_filtered = genes[genes['id_multidata'].apply(lambda gene: gene in genes_counts)]

    deconvoluted_complex_result_1 = deconvolute_complex_interaction_component(complex_compositions,
                                                                              genes_filtered,
                                                                              interactions,
                                                                              '_1',
                                                                              counts_data)
    deconvoluted_simple_result_1 = deconvolute_interaction_component(interactions,
                                                                     '_1',
                                                                     counts_data)

    deconvoluted_complex_result_2 = deconvolute_complex_interaction_component(complex_compositions,
                                                                              genes_filtered,
                                                                              interactions,
                                                                              '_2',
                                                                              counts_data)
    deconvoluted_simple_result_2 = deconvolute_interaction_component(interactions,
                                                                     '_2',
                                                                     counts_data)
    # Changes made here
    deconvoluted_complex_result_1 = deconvoluted_complex_result_1.loc[:,~deconvoluted_complex_result_1.columns.duplicated()]
    deconvoluted_simple_result_1 = deconvoluted_simple_result_1.loc[:,~deconvoluted_simple_result_1.columns.duplicated()]
    deconvoluted_complex_result_2 = deconvoluted_complex_result_2.loc[:,~deconvoluted_complex_result_2.columns.duplicated()]
    deconvoluted_simple_result_2 = deconvoluted_simple_result_2.loc[:,~deconvoluted_simple_result_2.columns.duplicated()]
    # End changes

    deconvoluted_result = deconvoluted_complex_result_1.append(
        [deconvoluted_simple_result_1, deconvoluted_complex_result_2, deconvoluted_simple_result_2], sort=False)

    deconvoluted_result.set_index('multidata_id', inplace=True, drop=True)

    deconvoluted_columns = ['gene_name', 'name', 'is_complex', 'protein_name', 'complex_name', 'id_cp_interaction',
                            'gene']

    deconvoluted_result = deconvoluted_result[deconvoluted_columns]
    deconvoluted_result.rename({'name': 'uniprot'}, axis=1, inplace=True)
    deconvoluted_result = pd.concat([deconvoluted_result, clusters_means], axis=1, join='inner', sort=False)
    deconvoluted_result.set_index('gene', inplace=True, drop=True)
    deconvoluted_result.drop_duplicates(inplace=True)

    return deconvoluted_result

I did not require the deconvoluted result so I was comfortable making this change, I would make it at your own risk :).