andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
102 stars 29 forks source link

key error for EG.5.1.8 #180

Closed xsitarcik closed 11 months ago

xsitarcik commented 1 year ago

Hello, I received the following error when re-running the same input after freyja update

Traceback (most recent call last):
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/bin/freyja", line 10, in <module>
    sys.exit(cli())
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/freyja/_cli.py", line 81, in demix
    df_barcodes = collapse_barcodes(df_barcodes, df_depth, depthcutoff,
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/freyja/utils.py", line 869, in collapse_barcodes
    pango_aliases = [lineage_data[lin]['alias']
  File "/github/workspace/.tests/.snakemake/conda/39483551ed654ea7bd7b75a68486482a_/lib/python3.9/site-packages/freyja/utils.py", line 869, in <listcomp>
    pango_aliases = [lineage_data[lin]['alias']
KeyError: 'EG.5.1.8'

I installed freyja from this env file:

channels:
  - defaults
  - bioconda
  - conda-forge
dependencies:
  - freyja=1.4.7

I used the following parameters: --eps 0.001 , --depthcutoff 1 and --confirmedonly. I ran this weekly as part of github action using the same data and now it failed out of the blue. Thanks.

joshuailevy commented 1 year ago

Hello @sitarcik!

Interesting- it seems like there may have been an update where the UShER tree was updated before the pango lineage list. When did this fail? Any chance you can send us the barcode and lineages.yml files that were used? It does look like EG.5.1.8 is present in the lineage metadata now: https://github.com/andersen-lab/Freyja/blob/78eb4a7b4cc02d303a5a2af03a2e651ecde5d9c9/freyja/data/lineages.yml#L30101

Best, Josh

xsitarcik commented 1 year ago

Hello, I used the following command to produce reference lineage data: freyja update --outdir {XY} I am attaching a zipped directory as the result of the above mentioned command: latest_lineages.zip

I do not know if that is relevant, but I could not find EG.5.1.8 on outbreak info page. Thanks.

joshuailevy commented 1 year ago

Thanks for the additional info! Weird- it does look like EG.5.1.8 is present in these files. Are you still seeing this error?

outbreak.info is a bit lagged relative to the Freyja metadata files, which update every day. EG.5.1.8 is still quite new, so that's probably why it's not yet available on the site.

xsitarcik commented 1 year ago

Yes, the error still persists.

I am running the update as follows:

freyja update --outdir resources/freyja_lineages/22_09_2023

Then I run demixing (here the error occurs):

freyja demix \
  --depthcutoff 2 \
  results/freyja/test1/variants.tsv \
  results/freyja/test1/freyja.depth \
  --output results/freyja/test1/freyja.demix \
  --eps 0.001 \
  --meta resources/freyja_lineages/22_09_2023/curated_lineages.json \
  --barcodes resources/freyja_lineages/22_09_2023/usher_barcodes.csv

Is this a correct way?

joshuailevy commented 1 year ago

Ok, thanks for the additional info! And yes- that is the correct way. Is it possible that the lineages.yml file isn't being updated since you're doing all of your updates into a local directory? Possible this is related to #165.

Can you share the variants.tsv and freyja.depth files? Would help with debugging.

xsitarcik commented 1 year ago

Thanks! The error disappeared if I first ran freyja update and then continued with commands as before. So it looks like that you are right about the problem source, i.e. the lineages.yml file is updated in a local directory but when demixing some other obsolete file is used instead. Can this workaround cause any problems? I mean mainly in the case of re-running older analysis, i.e. I want to re-run older analysis and use old curated_lineages.json and barcodes.csv. If I would use newer lineages.yml, for example updated today, but I used older curated_lineages.json and barcodes.csv , for example updated 2 months ago, would it changed the old result or not?

joshuailevy commented 1 year ago

Hi @xsitarcik, Sorry for the delay- was away for the last 1.5 weeks for a meeting. This workaround should not cause any problems, as all of the entries of an older lineages.yml file will be present in the newer one (plus whatever new lineages pop up).

We'll eventually add a separate input argument for this into the CLI, but this should work well for now!

xsitarcik commented 11 months ago

Thanks, so far, this workaround works for me.