andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
105 stars 30 forks source link

freyja plot works only when omitting coverage column from aggregated file #168

Closed carlo-berg closed 1 year ago

carlo-berg commented 1 year ago

Hi,

if I run freyja plot on the aggregated file it gives a key error.

$ freyja plot freyja_agg.tsv --output plot.pdf
Traceback (most recent call last):
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'summarized'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/testuser/miniconda3/bin/freyja", line 10, in <module>
    sys.exit(cli())
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/freyja/_cli.py", line 403, in plot
    makePlot_simple(agg_df, lineages, output, config, lineage_info,
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/freyja/utils.py", line 313, in makePlot_simple
    agg_df = prepSummaryDict(agg_df)
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/freyja/utils.py", line 291, in prepSummaryDict
    agg_d0.loc[:, 'summarized'] = agg_d0['summarized']\
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 3761, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/testuser/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 'summarized'

First line of aggregated file: summarized lineages abundances resid coverage

I then looked into the sample aggregated file at https://github.com/andersen-lab/Freyja/blob/main/freyja/data/aggregated_result.tsv and noticed that it does not contain the coverage column. So if I remove the coverage column from my aggregated file freyja plot runs and creates the plot as follows.

cut -f 1-5 freyja_agg.tsv > tmp.txt && freyja plot tmp.txt --output freyja_plot.pdf
WARNING: Freyja should be updated to include coverage estimates.

Versions:

freyja --version
freyja, version 1.4.5

conda --version
conda 23.7.2

python --version
Python 3.9.16
joshuailevy commented 1 year ago

Ah, the tsv file you've referenced is an old one. For the last year or so Freyja has included these estimates of genome coverage as a standard output. We do still accommodate the old format to ensure backwards compatibility.

It seems like you've run into a parsing problem, possibly as a function of low seq coverage like #166 or some other syntax issue that isn't quite being handled correctly by the default parser.

Any chance you can send a copy of the freyja_agg.tsv file my way? Happy to sort out what's going on. if you'd rather not share it here, my email is jolevy@scripps.edu

carlo-berg commented 1 year ago

Ah, I see. Yes, it has low coverage as its actually a negative sample.

freyja_agg.tsv:

    summarized  lineages    abundances  resid   coverage
file1.tsv   [('Other', 0.3598697557138268), ('Omicron', 0.19702381428535437), ('CH.1.1* [Omicron (CH.1.1.X)]', 0.1717847147673795), ('XBB.1.5* [Omicron (XBB.1.5.X)]', 0.1717847147673795), ('Epsilon', 0.06249999999989101), ('Delta', 0.03703699999997585)]   CH.1.1.27 FT.1 BN.1.4.1 BN.1.4.3 BN.1.4.2 B.1.429.1 B.1.44 AY.77 B.1.1.320 B.1.318 B.1.1.135 N.10 B.1.1.160 B.1.1.284 B.1.1.392 BQ.1.1.50 B.1.1.286 B.1.1.358 C.2.1 AZ.2 AZ.2.1 B.1.633 B.1.1.328 B.1.1.329 0.17178471 0.17178471 0.06250000 0.06250000 0.06250000 0.06250000 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.00952381 0.00952381 0.00952381 0.00952381 0.00952381 0.00952381 0.00952381 0.00321543 0.00321543 0.0792273631555323  0.2809082700732368
file2.tsv   [('Other', 0.3598697557138268), ('Omicron', 0.19702381428535437), ('CH.1.1* [Omicron (CH.1.1.X)]', 0.1717847147673795), ('XBB.1.5* [Omicron (XBB.1.5.X)]', 0.1717847147673795), ('Epsilon', 0.06249999999989101), ('Delta', 0.03703699999997585)]   CH.1.1.27 FT.1 BN.1.4.1 BN.1.4.3 BN.1.4.2 B.1.429.1 B.1.44 AY.77 B.1.1.320 B.1.318 B.1.1.135 N.10 B.1.1.160 B.1.1.284 B.1.1.392 BQ.1.1.50 B.1.1.286 B.1.1.358 C.2.1 AZ.2 AZ.2.1 B.1.633 B.1.1.328 B.1.1.329 0.17178471 0.17178471 0.06250000 0.06250000 0.06250000 0.06250000 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.03703700 0.00952381 0.00952381 0.00952381 0.00952381 0.00952381 0.00952381 0.00952381 0.00321543 0.00321543 0.0792273631555323  0.2809082700732368