Closed jmrussell closed 3 years ago
I fixed this by removing duplicate column names in
File "/n/core/Bioinformatics/analysis/LinhengLi/wl2329/cbio.wl2329.100/data/cpdb_input/cpdb_test/lib/python3.7/site-packages/cellphonedb/src/core/methods/cpdb_statistical_analysis_complex_method.py", line 248, in deconvoluted_complex_result_build
[deconvoluted_simple_result_1, deconvoluted_complex_result_2, deconvoluted_simple_result_2], sort=False)
Hi @jmrussell , Sorry to update this topic but I have the same issue with a custom DB. I don't understand what do you mean by "removing duplicate column names". Can you explain it (if you remember it, of course) ?
Hi @jmrussell Can you also please clarify? I am getting the same error, but I made sure that there were no duplicate columns in my counts.txt data. I don't know exactly what to change, in the code.
Currently lines 247-248 says : deconvoluted_result = deconvoluted_complex_result_1.append( [deconvoluted_simple_result_1, deconvoluted_complex_result_2, deconvoluted_simple_result_2], sort=False)
With this, I'm still getting the error message. I'm not sure what you mean by "remove duplicate column names". Any help would be appreciated.
Thank you!
def deconvoluted_complex_result_build(clusters_means: pd.DataFrame,
interactions: pd.DataFrame,
complex_compositions: pd.DataFrame,
counts: pd.DataFrame,
genes: pd.DataFrame,
counts_data: str) -> pd.DataFrame:
genes_counts = list(counts.index)
genes_filtered = genes[genes['id_multidata'].apply(lambda gene: gene in genes_counts)]
deconvoluted_complex_result_1 = deconvolute_complex_interaction_component(complex_compositions,
genes_filtered,
interactions,
'_1',
counts_data)
deconvoluted_simple_result_1 = deconvolute_interaction_component(interactions,
'_1',
counts_data)
deconvoluted_complex_result_2 = deconvolute_complex_interaction_component(complex_compositions,
genes_filtered,
interactions,
'_2',
counts_data)
deconvoluted_simple_result_2 = deconvolute_interaction_component(interactions,
'_2',
counts_data)
# Changes made here
deconvoluted_complex_result_1 = deconvoluted_complex_result_1.loc[:,~deconvoluted_complex_result_1.columns.duplicated()]
deconvoluted_simple_result_1 = deconvoluted_simple_result_1.loc[:,~deconvoluted_simple_result_1.columns.duplicated()]
deconvoluted_complex_result_2 = deconvoluted_complex_result_2.loc[:,~deconvoluted_complex_result_2.columns.duplicated()]
deconvoluted_simple_result_2 = deconvoluted_simple_result_2.loc[:,~deconvoluted_simple_result_2.columns.duplicated()]
# End changes
deconvoluted_result = deconvoluted_complex_result_1.append(
[deconvoluted_simple_result_1, deconvoluted_complex_result_2, deconvoluted_simple_result_2], sort=False)
deconvoluted_result.set_index('multidata_id', inplace=True, drop=True)
deconvoluted_columns = ['gene_name', 'name', 'is_complex', 'protein_name', 'complex_name', 'id_cp_interaction',
'gene']
deconvoluted_result = deconvoluted_result[deconvoluted_columns]
deconvoluted_result.rename({'name': 'uniprot'}, axis=1, inplace=True)
deconvoluted_result = pd.concat([deconvoluted_result, clusters_means], axis=1, join='inner', sort=False)
deconvoluted_result.set_index('gene', inplace=True, drop=True)
deconvoluted_result.drop_duplicates(inplace=True)
return deconvoluted_result
I did not require the deconvoluted result so I was comfortable making this change, I would make it at your own risk :).
When running an analysis using a custom DB, I get the following traceback during "Building Results"
The db generation process itself has zero errors and appears to be working properly. Any ideas?