andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
102 stars 29 forks source link

No group keys passed! #187

Closed LauraVP1994 closed 10 months ago

LauraVP1994 commented 10 months ago

Hello,

I have been using freyja and there are some errors that keep persisting due to "--depthcutoff"? It also seems that like the type of error ouptut is related to the version of freyja. It should be noted that this problem is especially in samples with low coverage:

Freyja version 1.4.5 Input code: freyja demix sample_ivar_freyja.variants.tsv sample_ivar_freyja.depth --eps 0.025 --depthcutoff 500 --output sample_ivar_freyja.demix

Error

Traceback (most recent call last):
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/bin/freyja", line 10, in <module>
    sys.exit(cli())
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/freyja/_cli.py", line 87, in demix
    df_barcodes = collapse_barcodes(df_barcodes, df_depth, depthcutoff,
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/freyja/utils.py", line 848, in collapse_barcodes
    duplicates = df_barcodes.groupby(df_barcodes.columns.tolist()).apply(
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/pandas/core/frame.py", line 8872, in groupby
    return DataFrameGroupBy(
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1274, in __init__
    grouper, exclusions, obj = get_grouper(
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/pandas/core/groupby/grouper.py", line 1037, in get_grouper
    raise ValueError("No group keys passed!")
ValueError: No group keys passed!

Input code: freyja demix sample_ivar_freyja.variants.tsv sample_ivar_freyja.depth --eps 0.025 --output sample_ivar_freyja.demix

Error/Warning
building mix/depth matrices
demixing
/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/cvxpy/problems/problem.py:1387: UserWarning: Solution may be inaccurate. Try another solver, adjusting the solver settings, or solve with verbose=True for more information.
  warnings.warn(
(wastewater_sars) lavanpoelvoorde@bioit-rd:/scratch/laura/COVID/In_Silico/Illumina/Omicron/MixOf1/Merge/Test$ mamba install freyja=1.4.7

Freyja version 1.4.7 Input code: freyja demix sample_ivar_freyja.variants.tsv sample_ivar_freyja.depth --eps 0.025 --depthcutoff 500 --output sample_ivar_freyja.demix

Error


Traceback (most recent call last):
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/bin/freyja", line 10, in <module>
    sys.exit(cli())
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/freyja/_cli.py", line 81, in demix
    df_barcodes = collapse_barcodes(df_barcodes, df_depth, depthcutoff,
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/freyja/utils.py", line 848, in collapse_barcodes
    duplicates = df_barcodes.groupby(df_barcodes.columns.tolist()).apply(
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/pandas/core/frame.py", line 8872, in groupby
    return DataFrameGroupBy(
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/pandas/core/groupby/groupby.py", line 1274, in __init__
    grouper, exclusions, obj = get_grouper(
  File "/data/laura/Scripts/environments/miniconda3_rd/envs/wastewater_sars/lib/python3.9/site-packages/pandas/core/groupby/grouper.py", line 1037, in get_grouper
    raise ValueError("No group keys passed!")
ValueError: No group keys passed!

Input code: freyja demix sample_ivar_freyja.variants.tsv sample_ivar_freyja.depth --eps 0.025 --output sample_ivar_freyja.demix

Error

building mix/depth matrices
demixing
^Cdemix: Solver error encountered, most likely due to insufficient sequencing depth. Try increasing the --depthcutoff parameter.

sample_ivar_freyja.depth.txt sample_ivar_freyja.variants.tsv.txt

I also attached the files (.txt should be removed at end).

Thanks for any help!

joshuailevy commented 10 months ago

Hey @LauraVP1994! It sounds like you may be running into an issue where there's basically zero sites with your specified sequencing depth. @dylanpilz, can you take a closer look?

dylanpilz commented 10 months ago

Hey, as @joshuailevy pointed out, none of the sites in the sample have a higher coverage than 500, so all of the available data has been excluded. I think the error messaging might be partly to blame, since it always suggests increasing --depthcutoff whenever a solver error occurs. I'll add a check that determines whether the depthcutoff is set too high for the depth available in the sample.

Using Freyja version 1.4.7 with barcodes from 12-08-2023, I was only able to get a warning rather than an outright error, so it's quite possible that the version of usher_barcodes.csv you're using is incompatible with this sample. Could you try running freyja update and see if the issue persists?

LauraVP1994 commented 10 months ago

Thank you, it is indeed probably because of the low coverage of the sample. We use freyja as part of a larger snakemake script that should analyze samples from beginning to end, so it threw errors because no file was generated as a consequence, but we have solved this by creating an empty file if this error occurs.

I also had another question, the results show which variant from the barcode options is the closest, but is there also a way to know how close to the match the results are?

dylanpilz commented 10 months ago

Sounds great!

The resid field in the demix output gives the residual of the demixing problem, which describes how far the variants in the barcode file are from those observed in the sample at the provided depths.

LauraVP1994 commented 9 months ago

Dear,

Thank you for the answer, I gues this is the resid in the demix file? What is the range? When can you assume that it is a good resid?

Kind regards Laura

dylanpilz commented 9 months ago

Correct, you'll find resid listed in the demix file.

As to whether a given resid is considred good, that's going to depend on the type of sequencing data you're looking at. For Illumina sequencing of SARS-CoV-2, 20 or below is generally considered good, but for ONT sequencing that value could be signifcantly higher due to it being more error-prone.

At it's core, it's going to depend on how well your available sequencing data is able to capture viral diversity.