andersen-lab / Freyja-barcodes

BSD 2-Clause "Simplified" License
4 stars 0 forks source link

Index error : KeyError: "['H5Nx-Am'] not in index #3

Closed wilke closed 1 month ago

wilke commented 1 month ago

I am trying to use the H5Nx barcodes but get a key error.

Commands:

freyja variants $i --variants variants/${prefix}.variants --depths depth/${prefix}.depth --ref Freyja-barcodes/H5Nx/2024-09-12/reference.fasta
freyja demix --barcodes Freyja-barcodes/H5Nx/2024-09-12/barcode.csv --output tmp.out variants/${prefix}.variants.tsv depth/${prefix}.depth

Error:

File "/opt/conda/envs/freyja-env/lib/python3.12/site-packages/pandas/core/indexing.py", line 1558, in _get_listlike_indexer
    keyarr, indexer = ax._get_indexer_strict(key, axis_name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/freyja-env/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6200, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/opt/conda/envs/freyja-env/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['H5Nx-Am'] not in index"

Which lineage file should i use?

joshuailevy commented 1 month ago

Hi @wilke, thanks for pointing that out. As a result of SARS-CoV-2 naming conventions in the UShER tree, information following an underscore is removed from the lineage names. This can cause issues with pathogens where underscores show up in the lineage names if we don't first replace them with another character.

I've made the quick fix, hopefully that should get things going for you. Should be in the same spot: https://github.com/andersen-lab/Freyja-barcodes/blob/main/H5Nx/2024-09-12/barcode.csv

Josh

wilke commented 1 month ago

Thanks for the quick help, it fixed the error.