YosefLab / Compass

In-Silico Modeling of Metabolic Heterogeneity using Single-Cell Transcriptomes
BSD 3-Clause "New" or "Revised" License
96 stars 28 forks source link

AttributeError: 'DataFrame' object has no attribute 'iteritems' #92

Closed JoelHaas closed 1 year ago

JoelHaas commented 1 year ago

Thanks for the great work in developing this project. I am trying to set up compass to work on our computing cluster and I am running into some difficulty. I'm looking for some suggestions to troubleshoot!

After following the AWS installation guide, I get the following error using the example data: It builds the model cache seemingly correctly then when

"Evaluating Reaction Pentalties:" Traceback (most recent call last): File "/home/ubuntu/.local/bin/compass", line 8, in sys.exit(entry()) File "/home/ubuntu/.local/lib/python3.8/site-packages/compass/main.py", line 588, in entry penalties = eval_reaction_penalties(args['data'], args['model'], File "/home/ubuntu/.local/lib/python3.8/site-packages/compass/compass/penalties.py", line 96, in eval_reaction_penalties reaction_penalties = eval_reaction_penalties_shared( File "/home/ubuntu/.local/lib/python3.8/site-packages/compass/compass/penalties.py", line 158, in eval_reaction_penalties_shared for name, expression_data in expression.iteritems(): File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in getattr return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'iteritems'

My logfile looks like this: Compass version: 0.9.10.2 Git commit: Not in Git repo Python Version: 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0] Python prefix: /usr Numpy version: 1.24.2 Pandas version: 2.0.0 Supplied Arguments: data: ['/home/ubuntu/compass/counts_multik.tsv'] data_mtx: None model: RECON2_mat species: mus_musculus media: default-media output_dir: /home/ubuntu temp_dir: /home/ubuntu/_tmp torque_queue: None num_processes: None lambda: 0 single_sample: None transposed: False sample_range: None reaction_range: None metabolite_range: None generate_cache: False test_mode: False GNU nano 4.8 compass.log num_threads: 1 and_function: mean select_reactions: None select_subsystems: None glucose: None collect: False config_file: None num_neighbors: 30 symmetric_kernel: False input_weights: None penalty_diffusion: knn no_reactions: False calc_metabolites: False precache: False input_knn: None input_knn_distances: None output_knn: None latent_space: None only_penalties: None example_inputs: None microcluster_size: None microcluster_file: None microcluster_data_file: None anndata_output: False anndata_obs: None anndata_uns: None detailed_perf: False penalties_file: lpmethod: 4 advance: 2 save_argmaxes: False isoform_summing: remove-summing list_genes: None list_reactions: None

deto commented 1 year ago

Looks like iteritems was removed in pandas 2.0 - try using pandas version 1.5.3 instead.

@allonw - probably needs a bugfix as everyone will run into this now that pandas 2.0 release this last week.

JoelHaas commented 1 year ago

Thanks for the quick reply. The downgrade solved my issue. I'll close the issue as this will solve it for others, but a longer term fix to the code would be greatly appreciated!

tsykit commented 1 year ago

I solved my issue by downgrading pandas to 1.5.3. And when I ran the command. I got the following message.

FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead. for i,r in pd.DataFrame(lf.ra.Regulons,index=lf.ra.Gene).iteritems():

It seems that we can solve a similar issue by replacing ".iteriitems" with ".items"

Hope it helps.

atronchi commented 1 year ago

If you prefer to keep the latest pandas, this approach works as well:

import pandas as pd
pd.DataFrame.iteritems = pd.DataFrame.items
SartajBhuvaji commented 1 year ago

Hey @atronchi , Your approach worked for me, but when I do

spark_df.show()

I get, Py4JJavaError: An error occurred while calling o192.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (AlienwareM15-R2.lan executor driver): org.apache.spark.SparkException: Python worker failed to connect back.

`pd.DataFrame.iteritems = pd.DataFrame.items spark_df=spark.createDataFrame(news_df)

spark_df.show()

ERROR: Py4JJavaError: An error occurred while calling o192.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (AlienwareM15-R2.lan executor driver): org.apache.spark.SparkException: Python worker failed to connect back.

` I'm new to spark, Thanks for the help 🙂

RubTalha commented 1 year ago

https://stackoverflow.com/questions/76404811/attributeerror-dataframe-object-has-no-attribute-iteritems

xxz19900 commented 1 month ago

@atronchi Hi, how do you run the compass command when you have made changes? My code is as follows, but it still gives an error. `python -c "import pandas as pd; pd.DataFrame.iteritems = pd.DataFrame.items"

exptsv_path=/home/user/Software/MEBOCOST/avg_exp_mat.tsv species=homo_sapiens output_path=/home/user/Software/MEBOCOST/compass_res temp_path=/home/user/Software/MEBOCOST/compass_res_tmp core=8 compass=/home/user/.local/bin/compass

echo '++++ run compass'

since it is cell type level analysis, so lambda set to 0

$compass --data $exptsv_path --num-thread $core --species $species --output-dir $output_path --temp-dir $temp_path --calc-metabolites --lambda 0 echo 'Finished'`