andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
100 stars 29 forks source link

AttributeError: 'Series' object has no attribute 'columns' #215

Open jchorl opened 4 months ago

jchorl commented 4 months ago

Freyja dash command:

freyja dash lineages.tsv aggregated_freyja_metadata.csv freyja-title.txt freyja-content.txt --output dash.html

Latest version installed from conda.

At cursory glance, the issue stems from utils.py: https://github.com/andersen-lab/Freyja/blob/1b900ae92bbce0e733410fc82d788389b8d29522/freyja/utils.py#L567

I think it's because the logic above sometimes has df_ab_lin as a dataframe and sometimes as a series. You probably want to standardize on a single type for code sanity below. I'd recommend dataframe.

The following patch fixed issues for me. I'd be happy to PR it:

diff --git a/freyja/utils.py b/freyja/utils.py
index d219049..0ff613a 100644
--- a/freyja/utils.py
+++ b/freyja/utils.py
@@ -537,10 +537,11 @@ def get_abundance(agg_df, meta_df, thresh, scale_by_viral_load, config,
         dat = agg_df.loc[sampLabel, 'linDict']
         if isinstance(dat, list):
             if i == 0:
-                df_ab_lin = pd.Series(
+                row = pd.Series(
                     agg_df.loc[sampLabel, 'linDict'][0],
                     name=meta_df.loc[sampLabel,
                                      'sample_collection_datetime'])
+                df_ab_lin = pd.DataFrame(row)
             else:
                 df_ab_lin = pd.concat([
                     df_ab_lin,
@@ -550,10 +551,11 @@ def get_abundance(agg_df, meta_df, thresh, scale_by_viral_load, config,
                 ], axis=1)
         else:
             if i == 0:
-                df_ab_lin = pd.Series(
+                row = pd.Series(
                     agg_df.loc[sampLabel, 'linDict'],
                     name=meta_df.loc[sampLabel,
                                      'sample_collection_datetime'])
+                df_ab_lin = pd.DataFrame(row)
             else:
                 df_ab_lin = pd.concat([
                     df_ab_lin,
@@ -583,10 +585,11 @@ def get_abundance(agg_df, meta_df, thresh, scale_by_viral_load, config,
         dat = agg_df.loc[sampLabel, 'summarized']
         if isinstance(dat, list):
             if i == 0:
-                df_ab_sum = pd.Series(
+                row = pd.Series(
                     agg_df.loc[sampLabel, 'summarized'][0],
                     name=meta_df.loc[sampLabel,
                                      'sample_collection_datetime'])
+                df_ab_sum = pd.DataFrame(row)
             else:
                 df_ab_sum = pd.concat([
                     df_ab_sum,
@@ -596,10 +599,11 @@ def get_abundance(agg_df, meta_df, thresh, scale_by_viral_load, config,
                 ], axis=1)
         else:
             if i == 0:
-                df_ab_sum = pd.Series(
+                row = pd.Series(
                     agg_df.loc[sampLabel, 'summarized'],
                     name=meta_df.loc[sampLabel,
                                      'sample_collection_datetime'])
+                df_ab_sum = pd.DataFrame(row)
             else:
                 df_ab_sum = pd.concat([
                     df_ab_sum,